Merge pull request #8209: [BEAM-6990] Use CoderTranslation for resolving coders in cross-language configuration
diff --git a/.mailmap b/.mailmap
new file mode 100644
index 0000000..be5e16c
--- /dev/null
+++ b/.mailmap
@@ -0,0 +1,580 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# License); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+Abbass Marouni <amarouni@talend.com>
+Abdullah Bashir <mabdullah353@gmail.com>
+Adam Horky <adam.horky@firma.seznam.cz>
+Adam Horky <horky.adam@gmail.com>
+Ahmet Altay <aaltay@gmail.com>
+Ahmet Altay <altay@altay-macbookpro2.roam.corp.google.com>
+Ahmet Altay <altay@google.com>
+Alan Myrvold <alan@Alans-MacBook.local>
+Alan Myrvold <alan.myrvold@comcast.net>
+Alan Myrvold <amyrvold@amyrvold-macbookpro.roam.corp.google.com>
+Alan Myrvold <amyrvold@google.com>
+Aleksandr Kokhaniukov <alexander.kohanyukov@gmail.com>
+Alex Amato <ajamato@ajamato2016.sea.corp.google.com>
+Alex Amato <ajamato@ajamato-linux0.sea.corp.google.com>
+Alex Amato <ajamato@google.com>
+Alexander Dejanovski <alex@thelastpickle.com>
+Alexander Hoem Rosbach <alexander@rosbach.no>
+Alexey Diomin <diominay@gmail.com>
+Alexey Romanenko <33895511+aromanenko-dev@users.noreply.github.com>
+Alexey Romanenko <aromanenko-dev@gmail.com>
+Alexey Romanenko <aromanenko.dev@gmail.com>
+Alex Filatov <alex-filatov@users.noreply.github.com>
+Alex Van Boxel <alex@vanboxel.be>
+Aljoscha Krettek <aljoscha.krettek@gmail.com>
+Amit Sela <amitsela33@gmail.com>
+Amy Unruh <amyu@google.com>
+Anders Johnson <andersjohnson@google.com>
+Andrea Foegler <foegler@foegler-macbookpro3.roam.corp.google.com>
+Andrea Foegler <foegler@google.com>
+Andreas Ehrencrona <andreas.ehrencrona@velik.it>
+andreich <andreich@google.com>
+Andrew Brampton <bramp@google.com>
+Andrew Fulton <afulton@me.com>
+Andrew Fulton <andrew@flumaion.com>
+Andrew Martin <amartin@spotify.com>
+Andrew Martin <andrewsmartin.mg@gmail.com>
+Andrew Pilloud <apilloud@google.com>
+Andrew Pilloud <apilloud@users.noreply.github.com>
+Aneesh Naman <aneeshnaman@google.com>
+Anil Muppalla <anil@spotify.com>
+Ankit Jhalaria <ajhalaria@godaddy.com>
+Ankur Goenka <angoenka@users.noreply.github.com>
+Ankur Goenka <ankurgoenka@gmail.com>
+Ankur Goenka <goenka@goenka.svl.corp.google.com>
+Antonio D'souza <adsouza@gmail.com>
+Anton Kedin <33067037+akedin@users.noreply.github.com>
+Anton Kedin <kedin@google.com>
+Anton Kedin <kedin@kedin-macbookpro.roam.corp.google.com>
+Arnaud <arnaudfournier921@gmail.com>
+ArnaudFnr <arnaudfournier021@gmail.com>
+Arnaud Fournier <arnaudfournier921@gmail.com>
+Arun Sethia <sethia.arun@gmail.com>
+Asha Rostamianfar <arostami@google.com>
+Austin.Bennett <austin.bennett@mbp.Academy-AP>
+Aviem Zur <aviemzur@gmail.com>
+Axel Magnuson <axelmagn@gmail.com>
+Batkhuyag Batsaikhan <batbat@google.com>
+Ben Chambers <bchambers@bchambers-macbookpro2.roam.corp.google.com>
+Ben Chambers <bchambers@google.com>
+Ben Chambers <bjchambers@users.noreply.github.com>
+Ben Sidhom <bsidhom@gmail.com>
+Ben Sidhom <sidhom@google.com>
+Ben Song <songb@google.com>
+Benson Margulies <bimargulies@bimargulies.sea.corp.google.com>
+Benson Margulies <bimargulies@google.com>
+Bill Neubauer <wcn@google.com>
+Bingfeng Shu <44354306+BingfengShu@users.noreply.github.com>
+Bingfeng Shu <bshu@bshu.svl.corp.google.com>
+Borisa Zivkovic <borisa.zivkovic@huawei.com>
+Boyuan Zhang <36090911+boyuanzz@users.noreply.github.com>
+Boyuan Zhang <boyuan@google.com>
+Boyuan Zhang <boyuanz@google.com>
+Braden Bassingthwaite <bbassingthwaite@vendasta.com>
+Brian Foo <bfoo@bfoo-macbookpro.roam.corp.google.com>
+Brian Hulette <bhulette@google.com>
+Brian Hulette <hulettbh@gmail.com>
+Brian Martin <brianmartin@gmail.com>
+Brian Quinlan <brian@sweetapp.com>
+Cade Markegard <cademarkegard@gmail.com>
+Cao Manh Dat <datcm@apache.org>
+Carl McGraw <carlm@accretivetg.com>
+Carlos Alonso <carlos.alonso@cabify.com>
+Chaim Turkel <cyturel@gmail.com>
+Chamikara Jayalath <chamikara@apache.org>
+Chamikara Jayalath <chamikara@chamikara-linux.svl.corp.google.com>
+Chamikara Jayalath <chamikara@google.com>
+Chandni Singh <chandni.singh@capitalone.com>
+Chang chen <baibaichen@gmail.com>
+Charles Chen <ccy@google.com>
+Charles Chen <charlesccychen@users.noreply.github.com>
+Chen Bin <bchen@talend.com>
+Chikanaga Tomoyuki <t-chikanaga@groovenauts.jp>
+Chinmay Kolhatkar <chinmay@apache.org>
+Chris Broadfoot <cbro@golang.org>
+Christian Hudon <chrish@pianocktail.org>
+Christian Schneider <chris@die-schneider.net>
+Chuan Yu Foo <1147435+chuanyu@users.noreply.github.com>
+Chuan Yu Foo <cyfoo@google.com>
+Cindy Kuhn <ckuhn@google.com>
+Cody Schroeder <schroederc@google.com>
+Colin Phipps <cph@moria.org.uk>
+Colin Phipps <fipsy@google.com>
+colinreid <colinreid@google.com>
+Colm O hEigeartaigh <coheigea@apache.org>
+Colm O hEigeartaigh <coheigea@users.noreply.github.com>
+Connell O'Callahan <40410951+connelloG@users.noreply.github.com>
+Cory Brzycki <supercclank@gmail.com>
+Cory Brzycki <supercclank@google.com>
+Craig Chambers <45049052+CraigChambersG@users.noreply.github.com>
+Craig Chambers <chambers@google.com>
+Craig Citro <craigcitro@google.com>
+Cristian <me@cristian.io>
+Damien Gouyette <damien.gouyette@gmail.com>
+Dan Duong <danduong@google.com>
+Daniel Halperin <daniel@halper.in>
+Daniel Halperin <dhalperi@google.com>
+Daniel Halperin <dhalperi@users.noreply.github.com>
+Daniel Kulp <dkulp@apache.org>
+Daniel Mills <millsd@google.com>
+Daniel Mills <millsd@millsd.sea.corp.google.com>
+Daniel Norberg <dano@spotify.com>
+Daniel Oliveira <daniel.o.programmer@gmail.com>
+Daniel Oliveira <younghoono@gmail.com>
+Dan Ringwalt <ringwalt@google.com>
+Dariusz Aniszewski <dariusz@aniszewski.eu>
+Dariusz Aniszewski <dariusz.aniszewski@polidea.com>
+Dat Tran <dattran@sentifi.com>
+David Alves <david.alves@cloudera.com>
+DavidB <david.billings@gandlake.com>
+David Cavazos <davido262@gmail.com>
+David Cavazos <dcavazos@google.com>
+David Desberg <david.desberg@uber.com>
+David Hrbacek <david.hrbacek@firma.seznam.cz>
+David Moravek <david.moravek@firma.seznam.cz>
+David Moravek <david.moravek@gmail.com>
+David Rieber <drieber@google.com>
+David Sabater <david.sabater@gmail.com>
+David Sabater Dinter <david.sabater@gmail.com>
+David Volquartz Lebech <david@lebech.info>
+David Yan <davidyan@apache.org>
+Davor Bonaci <davorbonaci@users.noreply.github.com>
+Davor Bonaci <davor@davor-macbookair.roam.corp.google.com>
+Davor Bonaci <davor@google.com>
+Dawid Wysakowicz <dawid@getindata.com>
+Dennis Huo <dhuo@google.com>
+Derek Perez <pzd@google.com>
+Devin Donnelly <ddonnelly@google.com>
+Devon Meunier <devon.meunier@shopify.com>
+Dipti Kulkarni <dipti_dkulkarni@persistent.co.in>
+Dmytro Ivanov <dimon.ivan@gmail.com>
+Dusan Rychnovsky <dusan.rychnovsky@firma.seznam.cz>
+Dustin Rhodes <dcrhodes@google.com>
+Ed Hartwell Goose <ed@mention-me.com>
+Elliott Brossard <elliottb@google.com>
+Eric Anderson <eandersonm@gmail.com>
+Eric Beach <ebeach@google.com>
+Eric Roshan-Eisner <ede@alum.mit.edu>
+Eric Roshan-Eisner <edre@google.com>
+Etienne Chauchot and Jean-Baptiste Onofré <echauchot@gmail.com+jbonofre@apache.org>
+Etienne Chauchot <echauchot@apache.org>
+Etienne Chauchot <echauchot@gmail.com>
+Eugene Kirpichov <ekirpichov@gmail.com>
+Eugene Kirpichov <kirpichov@google.com>
+Exprosed <larryruili@gmail.com>
+Fabien Rousseau <fabien.rousseau@happn.com>
+Flavio Fiszman <flaviocf@flaviocf-macbookpro.roam.corp.google.com>
+Flavio Fiszman <flaviocf@google.com>
+Frances Perry <fjp@google.com>
+Frances Perry <francesperry@users.noreply.github.com>
+Frank Yellin <fy@fyellin.com>
+Gareth Western <gareth@garethwestern.com>
+Garrett Jones <garrettjonesgoogle@users.noreply.github.com>
+Gaurav Gupta <gaugupt3@cisco.com>
+Geet Kumar <geet.kumar75@gmail.com>
+Geet Kumar <gkumar7@users.noreply.github.com>
+Gene Peters <gene@telligent-data.com>
+Gergely Novak <gnovak@hortonworks.com>
+Gleb Kanterov <gleb@spotify.com>
+Gleb Kanterov <kanterov@users.noreply.github.com>
+Glenn Ammons <ammons@google.com>
+Grzegorz Kołakowski <grzegorz.kolakowski@getindata.com>
+Guillaume Blaquiere <guillaume.blaquiere@hrsys.Fr>
+Gus Katsiapis <katsiapis@katsiapis-linux.mtv.corp.google.com>
+Hadar Hod <hadarh@google.com>
+Hai Lu <halu@linkedin.com>
+Harch Vardhan <ananvay@google.com>
+Harsh Vardhan <ananvay2000@yahoo.com>
+Heejong Lee <heejong@gmail.com>
+Henning Rohde <herohde@google.com>
+Henning Rohde <herohde@seekerror.org>
+Henry Deist <hdeist@google.com>
+Henry Suryawirawan <hsuryawirawan@google.com>
+Holden Karau <holdenkarau@google.com>
+Holden Karau <holden@pigscanfly.ca>
+Holden Karau <holden@us.ibm.com>
+Ho Tien Vu <ho0001vu@gmail.com>
+Huygaa Batsaikhan <batbat@batbat-linuxworkstation.sea.corp.google.com>
+Huygaa Batsaikhan <batbat@google.com>
+Ian Zhou <ianzhou@google.com>
+Igor Bernstein <igorbernstein@google.com>
+Ilya Figotin <ifigotin@gmail.com>
+Ilya Ganelin <ilya.ganelin@capitalone.com>
+Innocent Djiofack <djiofack007@gmail.com>
+Ismaël Mejía <iemejia@apache.org>
+Ismaël Mejía <iemejia@gmail.com>
+Itamar Ostricher <itamarost@gmail.com>
+Jack Hsueh <jhsueh@opengov.com>
+Jacky <jackyq2015@gmail.com>
+Jacob Marble <jmarble@kochava.com>
+Jakob Homan <jghoman@gmail.com>
+James Malone <jamalone@gmail.com>
+James Malone <jamesmalone@google.com>
+James Xu <xumingmingv@gmail.com>
+Jan Lukavsky <jan.lukavsky@firma.seznam.cz>
+Jan Lukavsky <jan.lukavsky@o2.cz>
+Jan Lukavsky <je.ik@seznam.cz>
+Jan Lukavský <je.ik@seznam.cz>
+Jára Vaněk <jaromir.vanek2@firma.seznam.cz>
+Jaromir Vanek <vanek.jaromir@gmail.com>
+Jason Dobry <jdobry@google.com>
+Jason Kuster <jason@google.com>
+Jason Kuster <jasonkuster@google.com>
+Jason White <jason.white@shopify.com>
+Javier Antonio Gonzalez Trejo <javier.antonio.gonzalez.trejo@gmail.com>
+Javier Moreno <bluelephant@gmail.com>
+Jean-Baptiste Onofré <jb@nanthrax.net>
+Jean-Baptiste Onofré <jbonofre@apache.org>
+Jean-Philippe Martin <jpmartin@google.com>
+Jeff Gardner <gardnerj@google.com>
+Jeff Klukas <jeff@klukas.net>
+Jeffrey Scott Keone Payne <jeffkpayne@gmail.com>
+Jeremie Lenfant-Engelmann <jeremiele@google.com>
+Jeremy Hurwitz <hurwitz@google.com>
+Jeremy Lewi <jlewi@google.com>
+Jeremy Weinstein <jeremydw@gmail.com>
+Jeroen Steggink <jsteggink@users.noreply.github.com>
+Jesse Anderson <jesse@smokinghand.com>
+Jianfeng Qian <Jianfeng.Qian@outlook.com>
+JiJun Tang <tangjijun@yhd.com>
+Jingsong Li <lzljs3620320@aliyun.com>
+Jins George <jins.george@aeris.net>
+Joachim van der Herten <joachim.vanderherten@ugent.be>
+João Cabrita <kewne@protonmail.com>
+Joar Wandborg <joar@wandborg.se>
+Joey Baruch <joey.baruch@gmail.com>
+John MacMillan <johnmac@ca.ibm.com>
+Josh <joshformangornall@gmail.com>
+Joshua Litt <joshualitt@google.com>
+Joshua Litt <joshualitt@joshualitt.mtv.corp.google.com>
+Josh Wills <josh.wills@gmail.com>
+Josh Wills <jwills@cloudera.com>
+Jozef Vilcek <Jozef.Vilcek@sizmek.com>
+Jozef Vilcek <jozo.vilcek@gmail.com>
+Juan Rael <juan@qlogic.io>
+Julien Phalip <jphalip@gmail.com>
+Julien Tournay <boudhevil@gmail.com>
+Juliet Hougland <juliet@cloudera.com>
+Justin Tumale <fjetumale@gmail.com>
+Kadir Cetinkaya <kadircet@google.com>
+Kai Jiang <jiangkai@gmail.com>
+Kamil Szewczyk <szewinho@gmail.com>
+Karen Nino <knino@google.com>
+Kasia Kucharczyk <2536609+kkucharc@users.noreply.github.com>
+Kasia Kucharczyk <katarzyna.kucharczyk@polidea.com>
+Keiji Yoshida <keijiyoshida.mail@gmail.com>
+Keisuke Kondo <keisuke.kondo@istellar.jp>
+Keith McNeill <mcneill@anibla.net>
+Kelly Westbrooks <kwestbrooks@google.com>
+Kengo Seki <sekikn@apache.org>
+Kenneth Jung <kmj@google.com>
+Kenneth Knowles <kenn@apache.org>
+Kenneth Knowles <kenn@kennknowles.com>
+Kenneth Knowles <klk@google.com>
+Kevin Graney <kmg@google.com>
+Kevin Graney <nanonet@gmail.com>
+Kevin Peterson <kpeterson@nestlabs.com>
+Kevin Si <kevinsi@google.com>
+Kevin Sookocheff <kevin.sookocheff@workiva.com>
+Kirill Kozlov <kozlov.k.e@gmail.com>
+Kobi Salant <kobi.salant@gmail.com>
+Kobi Salant <ksalant@payapal.com>
+Kostas Kloudas <kkloudas@gmail.com>
+Kris Hildrum <hildrum@google.com>
+Krzysztof Trubalski <k.trubalski@ocado.com>
+Kurt Kluever <kak@google.com>
+Kyle Weaver <kcweaver@google.com>
+Kyle Winkelman <kyle.winkelman@optum.com>
+Lara Schmidt <laraschmidt@google.com>
+Leen Toelen <leen.toelen@tomtom.com>
+Leen Toelen <toelen@gmail.com>
+Levi Bowman <lbowman@gmail.com>
+Liam Miller-Cushon <cushon@google.com>
+Liang Zhang <liangzhang@google.com>
+Logan HAUSPIE <logan.hauspie.pro@gmail.com>
+Lorenzo Caggioni <lorenzo.caggioni@gmail.com>
+Lucas Amorim <lucasamorim@Lucass-MacBook-Pro.local>
+Lucas Amorim <lucasamorim@Lucass-MBP.hitronhub.home>
+Lucas Amorim <lucasamorim@protonmail.com>
+Luis Enrique Ortíz Ramirez <luisortramirez2211@gmail.com>
+Luis Osa <luis.osa.gdc@gmail.com>
+Lukas Drbal <lukas.drbal@gmail.com>
+Lukasz Cwik <lcwik@google.com>
+Lukasz Cwik <lukecwik@gmail.com>
+Łukasz Gajowy <lukasz.gajowy@gmail.com>
+Łukasz Gajowy <lukasz.gajowy@polidea.com>
+Luke Cwik <lcwik@google.com>
+Luke Cwik <lcwik@visitor-lcwik.wat.corp.google.com>
+Luke Cwik <lukecwik@gmail.com>
+Luke Zhu <luke.l.zhu@gmail.com>
+Luke Zhu <luke_zhu@brown.edu>
+Magnus Runesson <magru@spotify.com>
+Magnus Runesson <M.Runesson@gmail.com>
+Mairbek Khadikov <mairbek@google.com>
+Mairbek Khadikov <mkhadikov@gmail.com>
+Malo Denielou <malo@google.com>
+Manuel Fahndrich <fahndrich@google.com>
+Manu Zhang <owenzhang1990@gmail.com>
+Manu Zhang <OwenZhang1990@gmail.com>
+Marco Buccini <mbuccini@talend.com>
+Marek Simunek <marek.simunek@firma.seznam.cz>
+Marek Simunek <marek-simunek@seznam.cz>
+Maria Garcia Herrero <mariagh@google.com>
+María García Herrero <mariagh@mariagh.svl.corp.google.com>
+Marian Dvorsky <mariand@google.com>
+Maria Python <mariapython@users.noreply.github.com>
+Mark Daoust <markdaoust@google.com>
+Mark Liu <markflyhigh@users.noreply.github.com>
+Mark Liu <markliu@google.com>
+Mark Liu <markliu@markliu0.mtv.corp.google.com>
+Mark Liu <markliu@markliu-macbookpro.roam.corp.google.com>
+Mark Liu <markliu@markliu.svl.corp.google.com>
+Mark Shields <markshields@google.com>
+Mārtiņš Kalvāns <martins.kalvans@gmail.com>
+Martin Suchanek <mrtn@nrd.io>
+Márton Elek <elek@users.noreply.github.com>
+Mathieu Blanchard <mathieu.blanchard@happn.fr>
+Matt Austern <austern@google.com>
+Matthew Jones <mlety2@gmail.com>
+Matthias Baetens <baetensmatthias@gmail.com>
+Matthias Feys <matthiasfeys@gmail.com>
+Matthias Feys <matthiasfeys@hotmail.com>
+Matthias Feys <matthias@ml6.eu>
+Matthias Wessendorf <matzew@apache.org>
+Matt Lang <mattlang@google.com>
+Matt Lee <mattl@users.noreply.github.com>
+Maximilian Michels <max@posteo.de>
+Maximilian Michels <mxm@apache.org>
+Max <max@posteo.de>
+Max Shytikov <mshytikov@gmail.com>
+Melissa Pashniak <melissapa@google.com>
+Mergebot <mergebot@apache.org>
+Micah Wylde <micah@micahw.com>
+Micah Wylde <mwylde@lyft.com>
+Michael Luckey <25622840+adude3141@users.noreply.github.com>
+Michael Luckey <michael.luckey@ext.gfk.com>
+Michal Walenia <32354134+mwalenia@users.noreply.github.com>
+Michal Walenia <michal.walenia@polidea.com>
+Mike Pedersen <mike@mikepedersen.dk>
+Mike Pedersen <noctune9@gmail.com>
+Mikhail Gryzykhin <12602502+Ardagan@users.noreply.github.com>
+Mikhail Gryzykhin <gryzykhin.mikhail@gmail.com>
+Mikhail Gryzykhin <migryz@google.com>
+Mikhail Shmulyan <mshmulyan@google.com>
+Miles Saul <msaul@google.com>
+Miles Saul <msaul@msaul0.wat.corp.google.com>
+Mitch Shanklin <mshanklin@google.com>
+Motty Gruda <Mottyg1@gmail.com>
+Nathan Howell <nhowell@godaddy.com>
+Nawaid Shamim <nawaid.shamim@bbc.co.uk>
+Neda Mirian <nedam@google.com>
+Neda Mirian <neda.mirian@gmail.com>
+Neelesh Srinivas Salian <nsalian@cloudera.com>
+Neville Li <neville.lyh@gmail.com>
+Neville Li <neville@spotify.com>
+Niel Markwick <nielm@google.com>
+Niel Markwick <nielm@users.noreply.github.com>
+Niels Basjes <nbasjes@bol.com>
+Niels Basjes <niels@basjes.nl>
+Nigel Kilmer <nkilmer@google.com>
+Ole Langbehn <ole.langbehn@inoio.de>
+Ondrej Kvasnicka <ondrej.kvasnicka@firma.seznam.cz>
+Pablo Estrada <pabloem@google.com>
+Pablo Estrada <pabloem@users.noreply.github.com>
+Pascal Gula <pascal.gula@gmail.com>
+Pastuszka Przemysław <pastuszka.przemyslaw@gmail.com>
+Paul Gerver <pfgerver@gmail.com>
+Paul Gerver <pgerver@us.ibm.com>
+PaulVelthuis93 <paulvelthuis93@gmail.com>
+Pavel Slechta <pavel.slechta@firma.seznam.cz>
+Pawel Kaczmarczyk <pawel.pk.kaczmarczyk@gmail.com>
+Pawel Kaczmarczyk <p.kaczmarczyk@ocado.com>
+Pei He <hepei.hp@alibaba-inc.com>
+Pei He <hepeimail@gmail.com>
+Pei He <pei@apache.org>
+Pei He <peihe0@gmail.com>
+Pei He <peihe@google.com>
+Pei He <peihe@users.noreply.github.com>
+Peter Gergo Barna <pbarna@hortonworks.com>
+Petr Novotnik <petr.novotnik@firma.seznam.cz>
+Petr Shevtsov <petr.shevtsov@gmail.com>
+Pramod Immaneni <pramod@datatorrent.com>
+Prateek Chanda <prateekkol21@gmail.com>
+Prem Kumar Karunakaran <p.karunakaran@metrosystems.net>
+Radhika S Kulkarni <radhika_kulkarni1@persistent.co.in>
+Rafael Fernández <rfernand@google.com>
+Rafal Wojdyla <rav@spotify.com>
+Rafal Wojdyla <ravwojdyla@gmail.com>
+Raghu Angadi <rangadi@apache.org>
+Raghu Angadi <rangadi@google.com>
+Rahul Sabbineni <duraza@users.noreply.github.com>
+Renat <regata@users.noreply.github.com>
+Reuven Lax <relax@google.com>
+Reuven Lax <relax@relax-macbookpro2.roam.corp.google.com>
+Reuven Lax <relax@relax-macbookpro.roam.corp.google.com>
+Rezan Achmad <rezanachmad@gmail.com>
+Reza Rokni <7542791+rezarokni@users.noreply.github.com>
+Reza Rokni <rezarokni@google.com>
+Robbe Sneyders <robbe.sneyders@gmail.com>
+Robbe Sneyders <robbe.sneyders@ml6.eu>
+Rob Earhart <earhart@gmail.com>
+Rob Earhart <earhart@google.com>
+Robert Bradshaw <robertwb@gmail.com>
+Robert Bradshaw <robertwb@google.com>
+Robert Burke <lostluck@users.noreply.github.com>
+Robert Burke <rober@frantil.com>
+Robert Burke <robert@frantil.com>
+Roberto Congiu <rcongiu@agentace.com>
+Robin Qiu <robinyq@rodete-desktop-imager.corp.google.com>
+Rodrigo Benenson <rodrigo.benenson@gmail.com>
+Romain Manni-Bucau <rmannibucau@apache.org>
+Romain manni-Bucau <rmannibucau@gmail.com>
+Romain Manni-Bucau <rmannibucau@gmail.com>
+Romain Yon <yonromai@users.noreply.github.com>
+Rong Ou <rong.ou@gmail.com>
+Roy Lenferink <lenferinkroy@gmail.com>
+Rui Wang <amaliujia@163.com>
+Rui Wang <amaliujia@gmail.com>
+Rui Wang <amaliujia@users.noreply.github.com>
+Rune Fevang <fevang@exabel.com>
+Ruoyu Liu <ruoyu@google.com>
+Ruoyun Huang <huangry@gmail.com>
+Ryan Culbertson <ryan@spotify.com>
+Ryan Niemocienski <niemo@google.com>
+Ryan Skraba <ryan@skraba.com>
+Ryan Williams <ryan.blake.williams@gmail.com>
+sabhyankar <abhyankar@gmail.com>
+Sam McVeety <sam.mcveety@gmail.com>
+Sam McVeety <sgmc@google.com>
+Sam Rohde <rohde.samuel@gmail.com>
+Sam Waggoner <samuel.waggoner@healthsparq.com>
+Sam Whittle <samuelw@google.com>
+Sam Whittle <scwhittle@users.noreply.github.com>
+Sandeep Deshmukh <sandeep@datatorrent.com>
+Sandeep Parikh <sandeep@clusterbeep.org>
+Scott Wegner <scott@apache.com>
+Scott Wegner <scott@apache.org>
+Scott Wegner <swegner2@gmail.com>
+Scott Wegner <swegner@google.com>
+Scott Wegner <swegner@outlook.com>
+Sean O'Keefe <seano314@users.noreply.github.com>
+Sean Owen <sowen@cloudera.com>
+Sean Owen <srowen@gmail.com>
+Sela <ansela@paypal.com>
+Sergei Lebedev <s.lebedev@criteo.com>
+Sergey Beryozkin <sberyozkin@gmail.com>
+Sergio Fernández <sergio@wikier.org>
+Sergiy Byelozyorov <sergiyb@chromium.org>
+Seshadri Chakkravarthy <sesh.cr@gmail.com>
+Seunghyun Lee <shlee0605@gmail.com>
+Shashank Prabhakara <shashank@infoworks.io>
+Shinsuke Sugaya <shinsuke@apache.org>
+Shnitz <andrewktan@gmail.com>
+Silviu Calinoiu <silviuc@google.com>
+Simon Plovyt <40612002+splovyt@users.noreply.github.com>
+Sindy Li <qinyeli@qinyeli.svl.corp.google.com>
+Sindy Li <qinyeli@umich.edu>
+Slava Chernyak <chernyak@google.com>
+Slaven Bilac <slaven@google.com>
+SokolovMS <m.s.sokolov.92@gmail.com>
+Solomon Duskis <sduskis@google.com>
+Sourabh Bajaj <sb2nov@gmail.com>
+Sourabh Bajaj <sourabhbajaj@google.com>
+Stas Levin <staslevin@apache.org>
+Stas Levin <staslevin@gmail.com>
+Stas Levin <staslev@users.noreply.github.com>
+Stefano Baghino <stefano@baghino.me>
+Stepan Kadlec <stepan.kadlec@oracle.com>
+Stephan Ewen <sewen@apache.org>
+Stephan Hoyer <shoyer@google.com>
+Stephen Gildea <gildea@google.com>
+Stephen Lumenta <stephen.lumenta@gmail.com>
+Stephen Sisk <sisk@google.com>
+Stephen Sisk <ssisk@users.noreply.github.com>
+Steve Niemitz <sniemitz@twitter.com>
+Steve Wheeler <stevewheeler@google.com>
+Stijn Decubber <stijn.decubber@ml6.eu>
+Sumit Chawla <sumichaw@cisco.com>
+Sunil Pedapudi <skpedapudi@gmail.com>
+Taro Murao <taro.murao@gmail.com>
+Ted Yu <yuzhihong@gmail.com>
+Teng Peng <josephtengpeng@gmail.com>
+Theodore Siu <theosiu@theosiu-macbookpro24.roam.corp.google.com>
+Thomas Groh <tgroh@google.com>
+Thomas Groh <tgroh@users.noreply.github.com>
+Thomas Weise <thw@apache.org>
+Thomas Weise <tweise@lyft.com>
+Thomas Weise <tweise@users.noreply.github.com>
+Tianyang Hu <htyleo@gmail.com>
+Tibor Kiss <tibor.kiss@gmail.com>
+Tim Robertson <timrobertson100@gmail.com>
+Tim Robertson <timrobertson100@gmial.com>
+Tim Sears <sears.tim@gmail.com>
+Tobias Feldhaus <tobias.feldhaus@localsearch.ch>
+Tomas Novak <tomas.novak@firma.seznam.cz>
+Tomas Roos <ptomasroos@gmail.com>
+Tom Haines <thomas.haines@practiceinsight.io>
+Tom White <tom@cloudera.com>
+Tudor Marian <tudorm@google.com>
+Tyler Akidau <takidau@apache.org>
+Tyler Akidau <takidau@google.com>
+Udi Meiri <ehudm@google.com>
+Udi Meiri (Ehud) <udim@users.noreply.github.com>
+Udi Meiri <udim@users.noreply.github.com>
+Uri Silberstein <uri.silberstein@gmail.com>
+Uwe Jugel <uwe.jugel@lovoo.com>
+Vaclav Plajt <vaclav.plajt@firma.seznam.cz>
+Vaclav Plajt <vaclav.plajt@gmail.com>
+Valentyn Tymofieiev <valentyn@google.com>
+Valient Gough <vgough@google.com>
+Varun Dhussa <varundhussa@google.com>
+Vassil Kolarov <vas@vas.io>
+Vikas Kedigehalli <vikasrk@google.com>
+Vitalii Tverdokhlib <vitaliytv@nitralabs.com>
+Vladisav Jelisavcic <vj@apache.org>
+Vojtech Janota <vojtech.janota@oracle.com>
+Ward Van Assche <ward@piesync.com>
+Wesley Tanaka <wtanaka@yahoo.com>
+성준영 <wnsdud1861@gmail.com>
+Won Wook SONG <wonook@apache.org>
+Wout Scheepers <Wout.Scheepers@vente-exclusive.com>
+wslulciuc <willy@bounceexchange.com>
+Xin Wang <xinwang@apache.org>
+Xinyu Liu <xiliu@linkedin.com>
+Xinyu Liu <xiliu@xiliu-ld1.linkedin.biz>
+Xinyu Liu <xinyuliu.us@gmail.com>
+Yifan Zou <35050780+yifanzou@users.noreply.github.com>
+Yifan Zou <yifanzou@google.com>
+Yifan Zou <yifanzou@rodete-desktop-imager.corp.google.com>
+Yifan Zou <yifanzou@yifanzou-linuxworkstation.sea.corp.google.com>
+Yifan Zou <yifanzou@yifanzou-macbookpro.roam.corp.google.com>
+Younghee Kwon <younghee.kwon@gmail.com>
+Yuan (Terry) Tang <terrytangyuan@gmail.com>
+Yueyang Qiu <robinyqiu@gmail.com>
+Yunqing Zhou <zhouyunqing@zhouyunqing-macbookpro3.roam.corp.google.com>
+Yunqing Zhou <zhouyunqing@zhouyunqing-macbookpro.roam.corp.google.com>
+Zang <szang@lm-sea-11001278.corp.ebay.com>
+Zhuo Peng <1835738+brills@users.noreply.github.com>
+zhuoyao <zhuoyao@google.com>
+Zohar Yahav <zoy@giggles.nyc.corp.google.com>
+Zohar Yahav <zoy@smtp.corp.google.com>
+Zongwei Zhou <zongweiz@google.com>
+Zur, Aviem <azur@paypal.com>
+波特 <haozhi.shz@alibaba-inc.com>
+琨瑜 <yiyan.lyy@alibaba-inc.com>
diff --git a/.test-infra/kubernetes/kafka-cluster/00-namespace.yml b/.test-infra/kubernetes/kafka-cluster/00-namespace.yml
new file mode 100644
index 0000000..5f9c317
--- /dev/null
+++ b/.test-infra/kubernetes/kafka-cluster/00-namespace.yml
@@ -0,0 +1,19 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+---
+apiVersion: v1
+kind: Namespace
+metadata:
+ name: kafka
diff --git a/.test-infra/kubernetes/kafka-cluster/01-configure/gke-storageclass-broker-pd.yml b/.test-infra/kubernetes/kafka-cluster/01-configure/gke-storageclass-broker-pd.yml
new file mode 100644
index 0000000..2d12582
--- /dev/null
+++ b/.test-infra/kubernetes/kafka-cluster/01-configure/gke-storageclass-broker-pd.yml
@@ -0,0 +1,24 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+kind: StorageClass
+apiVersion: storage.k8s.io/v1
+metadata:
+ name: kafka-broker
+provisioner: kubernetes.io/gce-pd
+reclaimPolicy: Retain
+allowVolumeExpansion: true
+parameters:
+ type: pd-standard
diff --git a/.test-infra/kubernetes/kafka-cluster/01-configure/gke-storageclass-zookeeper-ssd.yml b/.test-infra/kubernetes/kafka-cluster/01-configure/gke-storageclass-zookeeper-ssd.yml
new file mode 100644
index 0000000..380fc95
--- /dev/null
+++ b/.test-infra/kubernetes/kafka-cluster/01-configure/gke-storageclass-zookeeper-ssd.yml
@@ -0,0 +1,24 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+kind: StorageClass
+apiVersion: storage.k8s.io/v1
+metadata:
+ name: kafka-zookeeper
+provisioner: kubernetes.io/gce-pd
+reclaimPolicy: Retain
+allowVolumeExpansion: true
+parameters:
+ type: pd-ssd
diff --git a/.test-infra/kubernetes/kafka-cluster/02-rbac-namespace-default/node-reader.yml b/.test-infra/kubernetes/kafka-cluster/02-rbac-namespace-default/node-reader.yml
new file mode 100644
index 0000000..1fed01f
--- /dev/null
+++ b/.test-infra/kubernetes/kafka-cluster/02-rbac-namespace-default/node-reader.yml
@@ -0,0 +1,45 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+---
+kind: ClusterRole
+apiVersion: rbac.authorization.k8s.io/v1
+metadata:
+ name: node-reader
+ labels:
+ origin: github.com_Yolean_kubernetes-kafka
+rules:
+- apiGroups:
+ - ""
+ resources:
+ - nodes
+ - services
+ verbs:
+ - get
+---
+kind: ClusterRoleBinding
+apiVersion: rbac.authorization.k8s.io/v1
+metadata:
+ name: kafka-node-reader
+ labels:
+ origin: github.com_Yolean_kubernetes-kafka
+roleRef:
+ apiGroup: rbac.authorization.k8s.io
+ kind: ClusterRole
+ name: node-reader
+subjects:
+- kind: ServiceAccount
+ name: default
+ namespace: kafka
diff --git a/.test-infra/kubernetes/kafka-cluster/02-rbac-namespace-default/pod-labler.yml b/.test-infra/kubernetes/kafka-cluster/02-rbac-namespace-default/pod-labler.yml
new file mode 100644
index 0000000..28f8fae
--- /dev/null
+++ b/.test-infra/kubernetes/kafka-cluster/02-rbac-namespace-default/pod-labler.yml
@@ -0,0 +1,48 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+---
+kind: Role
+apiVersion: rbac.authorization.k8s.io/v1
+metadata:
+ name: pod-labler
+ namespace: kafka
+ labels:
+ origin: github.com_Yolean_kubernetes-kafka
+rules:
+- apiGroups:
+ - ""
+ resources:
+ - pods
+ verbs:
+ - get
+ - update
+ - patch
+---
+kind: RoleBinding
+apiVersion: rbac.authorization.k8s.io/v1
+metadata:
+ name: kafka-pod-labler
+ namespace: kafka
+ labels:
+ origin: github.com_Yolean_kubernetes-kafka
+roleRef:
+ apiGroup: rbac.authorization.k8s.io
+ kind: Role
+ name: pod-labler
+subjects:
+- kind: ServiceAccount
+ name: default
+ namespace: kafka
diff --git a/.test-infra/kubernetes/kafka-cluster/03-zookeeper/10zookeeper-config.yml b/.test-infra/kubernetes/kafka-cluster/03-zookeeper/10zookeeper-config.yml
new file mode 100644
index 0000000..db75e1d
--- /dev/null
+++ b/.test-infra/kubernetes/kafka-cluster/03-zookeeper/10zookeeper-config.yml
@@ -0,0 +1,56 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+kind: ConfigMap
+metadata:
+ name: zookeeper-config
+ namespace: kafka
+apiVersion: v1
+data:
+ init.sh: |-
+ #!/bin/bash
+ set -e
+ set -x
+
+ [ -d /var/lib/zookeeper/data ] || mkdir /var/lib/zookeeper/data
+ [ -z "$ID_OFFSET" ] && ID_OFFSET=1
+ export ZOOKEEPER_SERVER_ID=$((${HOSTNAME##*-} + $ID_OFFSET))
+ echo "${ZOOKEEPER_SERVER_ID:-1}" | tee /var/lib/zookeeper/data/myid
+ cp -Lur /etc/kafka-configmap/* /etc/kafka/
+ sed -i "s/server\.$ZOOKEEPER_SERVER_ID\=[a-z0-9.-]*/server.$ZOOKEEPER_SERVER_ID=0.0.0.0/" /etc/kafka/zookeeper.properties
+
+ zookeeper.properties: |-
+ tickTime=2000
+ dataDir=/var/lib/zookeeper/data
+ dataLogDir=/var/lib/zookeeper/log
+ clientPort=2181
+ maxClientCnxns=1
+ initLimit=5
+ syncLimit=2
+ server.1=pzoo-0.pzoo:2888:3888:participant
+ server.2=pzoo-1.pzoo:2888:3888:participant
+ server.3=pzoo-2.pzoo:2888:3888:participant
+ server.4=zoo-0.zoo:2888:3888:participant
+ server.5=zoo-1.zoo:2888:3888:participant
+
+ log4j.properties: |-
+ log4j.rootLogger=INFO, stdout
+ log4j.appender.stdout=org.apache.log4j.ConsoleAppender
+ log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
+ log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n
+
+ # Suppress connection log messages, three lines per livenessProbe execution
+ log4j.logger.org.apache.zookeeper.server.NIOServerCnxnFactory=WARN
+ log4j.logger.org.apache.zookeeper.server.NIOServerCnxn=WARN
diff --git a/.test-infra/kubernetes/kafka-cluster/03-zookeeper/20pzoo-service.yml b/.test-infra/kubernetes/kafka-cluster/03-zookeeper/20pzoo-service.yml
new file mode 100644
index 0000000..00cc81e
--- /dev/null
+++ b/.test-infra/kubernetes/kafka-cluster/03-zookeeper/20pzoo-service.yml
@@ -0,0 +1,30 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+apiVersion: v1
+kind: Service
+metadata:
+ name: pzoo
+ namespace: kafka
+spec:
+ ports:
+ - port: 2888
+ name: peer
+ - port: 3888
+ name: leader-election
+ clusterIP: None
+ selector:
+ app: zookeeper
+ storage: persistent
diff --git a/.test-infra/kubernetes/kafka-cluster/03-zookeeper/30service.yml b/.test-infra/kubernetes/kafka-cluster/03-zookeeper/30service.yml
new file mode 100644
index 0000000..08e7350
--- /dev/null
+++ b/.test-infra/kubernetes/kafka-cluster/03-zookeeper/30service.yml
@@ -0,0 +1,26 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+apiVersion: v1
+kind: Service
+metadata:
+ name: zookeeper
+ namespace: kafka
+spec:
+ ports:
+ - port: 2181
+ name: client
+ selector:
+ app: zookeeper
diff --git a/.test-infra/kubernetes/kafka-cluster/03-zookeeper/50pzoo.yml b/.test-infra/kubernetes/kafka-cluster/03-zookeeper/50pzoo.yml
new file mode 100644
index 0000000..cea4eb1
--- /dev/null
+++ b/.test-infra/kubernetes/kafka-cluster/03-zookeeper/50pzoo.yml
@@ -0,0 +1,101 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+apiVersion: apps/v1
+kind: StatefulSet
+metadata:
+ name: pzoo
+ namespace: kafka
+spec:
+ selector:
+ matchLabels:
+ app: zookeeper
+ storage: persistent
+ serviceName: "pzoo"
+ replicas: 3
+ updateStrategy:
+ type: RollingUpdate
+ podManagementPolicy: Parallel
+ template:
+ metadata:
+ labels:
+ app: zookeeper
+ storage: persistent
+ annotations:
+ spec:
+ terminationGracePeriodSeconds: 10
+ initContainers:
+ - name: init-config
+ image: solsson/kafka-initutils@sha256:2cdb90ea514194d541c7b869ac15d2d530ca64889f56e270161fe4e5c3d076ea
+ command: ['/bin/bash', '/etc/kafka-configmap/init.sh']
+ volumeMounts:
+ - name: configmap
+ mountPath: /etc/kafka-configmap
+ - name: config
+ mountPath: /etc/kafka
+ - name: data
+ mountPath: /var/lib/zookeeper
+ containers:
+ - name: zookeeper
+ image: solsson/kafka:2.1.1@sha256:8bc8242c649c395ab79d76cc83b1052e63b4efea7f83547bf11eb3ef5ea6f8e1
+ env:
+ - name: KAFKA_LOG4J_OPTS
+ value: -Dlog4j.configuration=file:/etc/kafka/log4j.properties
+ command:
+ - ./bin/zookeeper-server-start.sh
+ - /etc/kafka/zookeeper.properties
+ lifecycle:
+ preStop:
+ exec:
+ command: ["sh", "-ce", "kill -s TERM 1; while $(kill -0 1 2>/dev/null); do sleep 1; done"]
+ ports:
+ - containerPort: 2181
+ name: client
+ - containerPort: 2888
+ name: peer
+ - containerPort: 3888
+ name: leader-election
+ resources:
+ requests:
+ cpu: 10m
+ memory: 100Mi
+ limits:
+ memory: 120Mi
+ readinessProbe:
+ exec:
+ command:
+ - /bin/sh
+ - -c
+ - '[ "imok" = "$(echo ruok | nc -w 1 -q 1 127.0.0.1 2181)" ]'
+ volumeMounts:
+ - name: config
+ mountPath: /etc/kafka
+ - name: data
+ mountPath: /var/lib/zookeeper
+ volumes:
+ - name: configmap
+ configMap:
+ name: zookeeper-config
+ - name: config
+ emptyDir: {}
+ volumeClaimTemplates:
+ - metadata:
+ name: data
+ spec:
+ accessModes: [ "ReadWriteOnce" ]
+ storageClassName: kafka-zookeeper
+ resources:
+ requests:
+ storage: 1Gi
diff --git a/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-0.yml b/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-0.yml
new file mode 100644
index 0000000..bbadf76
--- /dev/null
+++ b/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-0.yml
@@ -0,0 +1,30 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+kind: Service
+apiVersion: v1
+metadata:
+ name: outside-0
+ namespace: kafka
+spec:
+ selector:
+ app: kafka
+ kafka-broker-id: "0"
+ ports:
+ - protocol: TCP
+ targetPort: 9094
+ port: 32400
+ nodePort: 32400
+ type: LoadBalancer
diff --git a/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-1.yml b/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-1.yml
new file mode 100644
index 0000000..ea5fc9d
--- /dev/null
+++ b/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-1.yml
@@ -0,0 +1,30 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+kind: Service
+apiVersion: v1
+metadata:
+ name: outside-1
+ namespace: kafka
+spec:
+ selector:
+ app: kafka
+ kafka-broker-id: "1"
+ ports:
+ - protocol: TCP
+ targetPort: 9094
+ port: 32401
+ nodePort: 32401
+ type: LoadBalancer
diff --git a/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-2.yml b/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-2.yml
new file mode 100644
index 0000000..d7f1eac
--- /dev/null
+++ b/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-2.yml
@@ -0,0 +1,30 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+kind: Service
+apiVersion: v1
+metadata:
+ name: outside-2
+ namespace: kafka
+spec:
+ selector:
+ app: kafka
+ kafka-broker-id: "2"
+ ports:
+ - protocol: TCP
+ targetPort: 9094
+ port: 32402
+ nodePort: 32402
+ type: LoadBalancer
diff --git a/.test-infra/kubernetes/kafka-cluster/05-kafka/10broker-config.yml b/.test-infra/kubernetes/kafka-cluster/05-kafka/10broker-config.yml
new file mode 100644
index 0000000..27bc4e7
--- /dev/null
+++ b/.test-infra/kubernetes/kafka-cluster/05-kafka/10broker-config.yml
@@ -0,0 +1,275 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+kind: ConfigMap
+metadata:
+ name: broker-config
+ namespace: kafka
+apiVersion: v1
+data:
+ init.sh: |-
+ #!/bin/bash
+ set -e
+ set -x
+ cp /etc/kafka-configmap/log4j.properties /etc/kafka/
+
+ KAFKA_BROKER_ID=${HOSTNAME##*-}
+ SEDS=("s/#init#broker.id=#init#/broker.id=$KAFKA_BROKER_ID/")
+ LABELS="kafka-broker-id=$KAFKA_BROKER_ID"
+ ANNOTATIONS=""
+
+ hash kubectl 2>/dev/null || {
+ SEDS+=("s/#init#broker.rack=#init#/#init#broker.rack=# kubectl not found in path/")
+ } && {
+ ZONE=$(kubectl get node "$NODE_NAME" -o=go-template='{{index .metadata.labels "failure-domain.beta.kubernetes.io/zone"}}')
+ if [ "x$ZONE" == "x<no value>" ]; then
+ SEDS+=("s/#init#broker.rack=#init#/#init#broker.rack=# zone label not found for node $NODE_NAME/")
+ else
+ SEDS+=("s/#init#broker.rack=#init#/broker.rack=$ZONE/")
+ LABELS="$LABELS kafka-broker-rack=$ZONE"
+ fi
+ OUTSIDE_HOST=""
+ while [ -z $OUTSIDE_HOST ]; do
+ echo "Waiting for end point..."
+ OUTSIDE_HOST=$(kubectl get svc outside-${KAFKA_BROKER_ID} --template="{{range .status.loadBalancer.ingress}}{{.ip}}{{end}}")
+ [ -z "$OUTSIDE_HOST" ] && sleep 10
+ done
+ # OUTSIDE_HOST=$(kubectl get node "$NODE_NAME" -o jsonpath='{.status.addresses[?(@.type=="InternalIP")].address}')
+ OUTSIDE_PORT=3240${KAFKA_BROKER_ID}
+ SEDS+=("s|#init#advertised.listeners=OUTSIDE://#init#|advertised.listeners=OUTSIDE://${OUTSIDE_HOST}:${OUTSIDE_PORT}|")
+ ANNOTATIONS="$ANNOTATIONS kafka-listener-outside-host=$OUTSIDE_HOST kafka-listener-outside-port=$OUTSIDE_PORT"
+
+ if [ ! -z "$LABELS" ]; then
+ kubectl -n $POD_NAMESPACE label pod $POD_NAME $LABELS || echo "Failed to label $POD_NAMESPACE.$POD_NAME - RBAC issue?"
+ fi
+ if [ ! -z "$ANNOTATIONS" ]; then
+ kubectl -n $POD_NAMESPACE annotate pod $POD_NAME $ANNOTATIONS || echo "Failed to annotate $POD_NAMESPACE.$POD_NAME - RBAC issue?"
+ fi
+ }
+ printf '%s\n' "${SEDS[@]}" | sed -f - /etc/kafka-configmap/server.properties > /etc/kafka/server.properties.tmp
+ [ $? -eq 0 ] && mv /etc/kafka/server.properties.tmp /etc/kafka/server.properties
+
+ server.properties: |-
+ ############################# Log Basics #############################
+
+ # A comma seperated list of directories under which to store log files
+ # Overrides log.dir
+ log.dirs=/var/lib/kafka/data/topics
+
+ # The default number of log partitions per topic. More partitions allow greater
+ # parallelism for consumption, but this will also result in more files across
+ # the brokers.
+ num.partitions=12
+
+ default.replication.factor=3
+
+ min.insync.replicas=2
+
+ auto.create.topics.enable=false
+
+ # The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
+ # This value is recommended to be increased for installations with data dirs located in RAID array.
+ #num.recovery.threads.per.data.dir=1
+
+ ############################# Server Basics #############################
+
+ # The id of the broker. This must be set to a unique integer for each broker.
+ #init#broker.id=#init#
+
+ #init#broker.rack=#init#
+
+ ############################# Socket Server Settings #############################
+
+ # The address the socket server listens on. It will get the value returned from
+ # java.net.InetAddress.getCanonicalHostName() if not configured.
+ # FORMAT:
+ # listeners = listener_name://host_name:port
+ # EXAMPLE:
+ # listeners = PLAINTEXT://your.host.name:9092
+ #listeners=PLAINTEXT://:9092
+ listeners=OUTSIDE://:9094,PLAINTEXT://:9092
+
+ # Hostname and port the broker will advertise to producers and consumers. If not set,
+ # it uses the value for "listeners" if configured. Otherwise, it will use the value
+ # returned from java.net.InetAddress.getCanonicalHostName().
+ #advertised.listeners=PLAINTEXT://your.host.name:9092
+ #init#advertised.listeners=OUTSIDE://#init#,PLAINTEXT://:9092
+
+ # Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
+ #listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
+ listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL,OUTSIDE:PLAINTEXT
+ inter.broker.listener.name=PLAINTEXT
+
+ # The number of threads that the server uses for receiving requests from the network and sending responses to the network
+ #num.network.threads=3
+
+ # The number of threads that the server uses for processing requests, which may include disk I/O
+ #num.io.threads=8
+
+ # The send buffer (SO_SNDBUF) used by the socket server
+ #socket.send.buffer.bytes=102400
+
+ # The receive buffer (SO_RCVBUF) used by the socket server
+ #socket.receive.buffer.bytes=102400
+
+ # The maximum size of a request that the socket server will accept (protection against OOM)
+ #socket.request.max.bytes=104857600
+
+ ############################# Internal Topic Settings #############################
+ # The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
+ # For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
+ #offsets.topic.replication.factor=1
+ #transaction.state.log.replication.factor=1
+ #transaction.state.log.min.isr=1
+
+ ############################# Log Flush Policy #############################
+
+ # Messages are immediately written to the filesystem but by default we only fsync() to sync
+ # the OS cache lazily. The following configurations control the flush of data to disk.
+ # There are a few important trade-offs here:
+ # 1. Durability: Unflushed data may be lost if you are not using replication.
+ # 2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
+ # 3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to excessive seeks.
+ # The settings below allow one to configure the flush policy to flush data after a period of time or
+ # every N messages (or both). This can be done globally and overridden on a per-topic basis.
+
+ # The number of messages to accept before forcing a flush of data to disk
+ #log.flush.interval.messages=10000
+
+ # The maximum amount of time a message can sit in a log before we force a flush
+ #log.flush.interval.ms=1000
+
+ ############################# Log Retention Policy #############################
+
+ # The following configurations control the disposal of log segments. The policy can
+ # be set to delete segments after a period of time, or after a given size has accumulated.
+ # A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
+ # from the end of the log.
+
+ # https://cwiki.apache.org/confluence/display/KAFKA/KIP-186%3A+Increase+offsets+retention+default+to+7+days
+ offsets.retention.minutes=10080
+
+ # The minimum age of a log file to be eligible for deletion due to age
+ log.retention.hours=-1
+
+ # A size-based retention policy for logs. Segments are pruned from the log unless the remaining
+ # segments drop below log.retention.bytes. Functions independently of log.retention.hours.
+ #log.retention.bytes=1073741824
+
+ # The maximum size of a log segment file. When this size is reached a new log segment will be created.
+ #log.segment.bytes=1073741824
+
+ # The interval at which log segments are checked to see if they can be deleted according
+ # to the retention policies
+ #log.retention.check.interval.ms=300000
+
+ ############################# Zookeeper #############################
+
+ # Zookeeper connection string (see zookeeper docs for details).
+ # This is a comma separated host:port pairs, each corresponding to a zk
+ # server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
+ # You can also append an optional chroot string to the urls to specify the
+ # root directory for all kafka znodes.
+ zookeeper.connect=zookeeper:2181
+
+ # Timeout in ms for connecting to zookeeper
+ #zookeeper.connection.timeout.ms=6000
+
+
+ ############################# Group Coordinator Settings #############################
+
+ # The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
+ # The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
+ # The default value for this is 3 seconds.
+ # We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
+ # However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
+ #group.initial.rebalance.delay.ms=0
+
+ log4j.properties: |-
+ # Unspecified loggers and loggers with additivity=true output to server.log and stdout
+ # Note that INFO only applies to unspecified loggers, the log level of the child logger is used otherwise
+ log4j.rootLogger=INFO, stdout
+
+ log4j.appender.stdout=org.apache.log4j.ConsoleAppender
+ log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
+ log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n
+
+ log4j.appender.kafkaAppender=org.apache.log4j.DailyRollingFileAppender
+ log4j.appender.kafkaAppender.DatePattern='.'yyyy-MM-dd-HH
+ log4j.appender.kafkaAppender.File=${kafka.logs.dir}/server.log
+ log4j.appender.kafkaAppender.layout=org.apache.log4j.PatternLayout
+ log4j.appender.kafkaAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
+
+ log4j.appender.stateChangeAppender=org.apache.log4j.DailyRollingFileAppender
+ log4j.appender.stateChangeAppender.DatePattern='.'yyyy-MM-dd-HH
+ log4j.appender.stateChangeAppender.File=${kafka.logs.dir}/state-change.log
+ log4j.appender.stateChangeAppender.layout=org.apache.log4j.PatternLayout
+ log4j.appender.stateChangeAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
+
+ log4j.appender.requestAppender=org.apache.log4j.DailyRollingFileAppender
+ log4j.appender.requestAppender.DatePattern='.'yyyy-MM-dd-HH
+ log4j.appender.requestAppender.File=${kafka.logs.dir}/kafka-request.log
+ log4j.appender.requestAppender.layout=org.apache.log4j.PatternLayout
+ log4j.appender.requestAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
+
+ log4j.appender.cleanerAppender=org.apache.log4j.DailyRollingFileAppender
+ log4j.appender.cleanerAppender.DatePattern='.'yyyy-MM-dd-HH
+ log4j.appender.cleanerAppender.File=${kafka.logs.dir}/log-cleaner.log
+ log4j.appender.cleanerAppender.layout=org.apache.log4j.PatternLayout
+ log4j.appender.cleanerAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
+
+ log4j.appender.controllerAppender=org.apache.log4j.DailyRollingFileAppender
+ log4j.appender.controllerAppender.DatePattern='.'yyyy-MM-dd-HH
+ log4j.appender.controllerAppender.File=${kafka.logs.dir}/controller.log
+ log4j.appender.controllerAppender.layout=org.apache.log4j.PatternLayout
+ log4j.appender.controllerAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
+
+ log4j.appender.authorizerAppender=org.apache.log4j.DailyRollingFileAppender
+ log4j.appender.authorizerAppender.DatePattern='.'yyyy-MM-dd-HH
+ log4j.appender.authorizerAppender.File=${kafka.logs.dir}/kafka-authorizer.log
+ log4j.appender.authorizerAppender.layout=org.apache.log4j.PatternLayout
+ log4j.appender.authorizerAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
+
+ # Change the two lines below to adjust ZK client logging
+ log4j.logger.org.I0Itec.zkclient.ZkClient=INFO
+ log4j.logger.org.apache.zookeeper=INFO
+
+ # Change the two lines below to adjust the general broker logging level (output to server.log and stdout)
+ log4j.logger.kafka=INFO
+ log4j.logger.org.apache.kafka=INFO
+
+ # Change to DEBUG or TRACE to enable request logging
+ log4j.logger.kafka.request.logger=WARN, requestAppender
+ log4j.additivity.kafka.request.logger=false
+
+ # Uncomment the lines below and change log4j.logger.kafka.network.RequestChannel$ to TRACE for additional output
+ # related to the handling of requests
+ #log4j.logger.kafka.network.Processor=TRACE, requestAppender
+ #log4j.logger.kafka.server.KafkaApis=TRACE, requestAppender
+ #log4j.additivity.kafka.server.KafkaApis=false
+ log4j.logger.kafka.network.RequestChannel$=WARN, requestAppender
+ log4j.additivity.kafka.network.RequestChannel$=false
+
+ log4j.logger.kafka.controller=TRACE, controllerAppender
+ log4j.additivity.kafka.controller=false
+
+ log4j.logger.kafka.log.LogCleaner=INFO, cleanerAppender
+ log4j.additivity.kafka.log.LogCleaner=false
+
+ log4j.logger.state.change.logger=TRACE, stateChangeAppender
+ log4j.additivity.state.change.logger=false
+
+ # Change to DEBUG to enable audit log for the authorizer
+ log4j.logger.kafka.authorizer.logger=WARN, authorizerAppender
+ log4j.additivity.kafka.authorizer.logger=false
diff --git a/.test-infra/kubernetes/kafka-cluster/05-kafka/20dns.yml b/.test-infra/kubernetes/kafka-cluster/05-kafka/20dns.yml
new file mode 100644
index 0000000..2e14e76
--- /dev/null
+++ b/.test-infra/kubernetes/kafka-cluster/05-kafka/20dns.yml
@@ -0,0 +1,27 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# A headless service to create DNS records
+---
+apiVersion: v1
+kind: Service
+metadata:
+ name: broker
+ namespace: kafka
+spec:
+ ports:
+ - port: 9092
+ clusterIP: None
+ selector:
+ app: kafka
diff --git a/.test-infra/kubernetes/kafka-cluster/05-kafka/30bootstrap-service.yml b/.test-infra/kubernetes/kafka-cluster/05-kafka/30bootstrap-service.yml
new file mode 100644
index 0000000..5428795
--- /dev/null
+++ b/.test-infra/kubernetes/kafka-cluster/05-kafka/30bootstrap-service.yml
@@ -0,0 +1,25 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+---
+apiVersion: v1
+kind: Service
+metadata:
+ name: bootstrap
+ namespace: kafka
+spec:
+ ports:
+ - port: 9092
+ selector:
+ app: kafka
diff --git a/.test-infra/kubernetes/kafka-cluster/05-kafka/50kafka.yml b/.test-infra/kubernetes/kafka-cluster/05-kafka/50kafka.yml
new file mode 100644
index 0000000..9e19a74
--- /dev/null
+++ b/.test-infra/kubernetes/kafka-cluster/05-kafka/50kafka.yml
@@ -0,0 +1,120 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+apiVersion: apps/v1
+kind: StatefulSet
+metadata:
+ name: kafka
+ namespace: kafka
+spec:
+ selector:
+ matchLabels:
+ app: kafka
+ serviceName: "broker"
+ replicas: 3
+ updateStrategy:
+ type: RollingUpdate
+ podManagementPolicy: Parallel
+ template:
+ metadata:
+ labels:
+ app: kafka
+ annotations:
+ spec:
+ terminationGracePeriodSeconds: 30
+ initContainers:
+ - name: init-config
+ image: solsson/kafka-initutils@sha256:2cdb90ea514194d541c7b869ac15d2d530ca64889f56e270161fe4e5c3d076ea
+ env:
+ - name: NODE_NAME
+ valueFrom:
+ fieldRef:
+ fieldPath: spec.nodeName
+ - name: POD_NAME
+ valueFrom:
+ fieldRef:
+ fieldPath: metadata.name
+ - name: POD_NAMESPACE
+ valueFrom:
+ fieldRef:
+ fieldPath: metadata.namespace
+ command: ['/bin/bash', '/etc/kafka-configmap/init.sh']
+ volumeMounts:
+ - name: configmap
+ mountPath: /etc/kafka-configmap
+ - name: config
+ mountPath: /etc/kafka
+ - name: extensions
+ mountPath: /opt/kafka/libs/extensions
+ containers:
+ - name: broker
+ image: solsson/kafka:2.1.1@sha256:8bc8242c649c395ab79d76cc83b1052e63b4efea7f83547bf11eb3ef5ea6f8e1
+ env:
+ - name: CLASSPATH
+ value: /opt/kafka/libs/extensions/*
+ - name: KAFKA_LOG4J_OPTS
+ value: -Dlog4j.configuration=file:/etc/kafka/log4j.properties
+ - name: JMX_PORT
+ value: "5555"
+ ports:
+ - name: inside
+ containerPort: 9092
+ - name: outside
+ containerPort: 9094
+ - name: jmx
+ containerPort: 5555
+ command:
+ - ./bin/kafka-server-start.sh
+ - /etc/kafka/server.properties
+ lifecycle:
+ preStop:
+ exec:
+ command: ["sh", "-ce", "kill -s TERM 1; while $(kill -0 1 2>/dev/null); do sleep 1; done"]
+ resources:
+ requests:
+ cpu: 100m
+ memory: 100Mi
+ limits:
+ # This limit was intentionally set low as a reminder that
+ # the entire Yolean/kubernetes-kafka is meant to be tweaked
+ # before you run production workloads
+ memory: 600Mi
+ readinessProbe:
+ tcpSocket:
+ port: 9092
+ timeoutSeconds: 1
+ volumeMounts:
+ - name: config
+ mountPath: /etc/kafka
+ - name: data
+ mountPath: /var/lib/kafka/data
+ - name: extensions
+ mountPath: /opt/kafka/libs/extensions
+ volumes:
+ - name: configmap
+ configMap:
+ name: broker-config
+ - name: config
+ emptyDir: {}
+ - name: extensions
+ emptyDir: {}
+ volumeClaimTemplates:
+ - metadata:
+ name: data
+ spec:
+ accessModes: [ "ReadWriteOnce" ]
+ storageClassName: kafka-broker
+ resources:
+ requests:
+ storage: 10Gi
diff --git a/.test-infra/kubernetes/kafka-cluster/05-kafka/configmap-config.yaml b/.test-infra/kubernetes/kafka-cluster/05-kafka/configmap-config.yaml
new file mode 100644
index 0000000..cd52225
--- /dev/null
+++ b/.test-infra/kubernetes/kafka-cluster/05-kafka/configmap-config.yaml
@@ -0,0 +1,38 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+---
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ namespace: kafka
+ name: kafka-config
+data:
+ runtimeConfig.sh: |
+ #!/bin/bash
+ set -e
+ cd /usr/bin
+ until kafka-configs --zookeeper zookeeper:2181 --entity-type topics --describe || (( count++ >= 6 ))
+ do
+ echo "Waiting for Zookeeper..."
+ sleep 20
+ done
+ until nc -z kafka 9092 || (( retries++ >= 6 ))
+ do
+ echo "Waiting for Kafka..."
+ sleep 20
+ done
+ echo "Applying runtime configuration using confluentinc/cp-kafka:5.0.1"
+ kafka-topics --zookeeper zookeeper:2181 --create --if-not-exists --force --topic apache-beam-load-test --partitions 3 --replication-factor 2
+ kafka-configs --zookeeper zookeeper:2181 --entity-type topics --entity-name apache-beam-load-test --describe
diff --git a/.test-infra/kubernetes/kafka-cluster/05-kafka/job-config.yaml b/.test-infra/kubernetes/kafka-cluster/05-kafka/job-config.yaml
new file mode 100644
index 0000000..1aee3df
--- /dev/null
+++ b/.test-infra/kubernetes/kafka-cluster/05-kafka/job-config.yaml
@@ -0,0 +1,40 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+---
+# Source: kafka/templates/job-config.yaml
+apiVersion: batch/v1
+kind: Job
+metadata:
+ name: "kafka-config-eff079ec"
+ namespace: kafka
+spec:
+ template:
+ metadata:
+ labels:
+ app: kafka
+ spec:
+ restartPolicy: OnFailure
+ volumes:
+ - name: config-volume
+ configMap:
+ name: kafka-config
+ defaultMode: 0744
+ containers:
+ - name: kafka-config
+ image: "confluentinc/cp-kafka:5.0.1"
+ command: ["/usr/local/script/runtimeConfig.sh"]
+ volumeMounts:
+ - name: config-volume
+ mountPath: "/usr/local/script"
diff --git a/.test-infra/kubernetes/kafka-cluster/README.md b/.test-infra/kubernetes/kafka-cluster/README.md
new file mode 100644
index 0000000..124d300
--- /dev/null
+++ b/.test-infra/kubernetes/kafka-cluster/README.md
@@ -0,0 +1,30 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+## Kafka test cluster
+
+The kubernetes config files in this directory create a Kafka cluster
+comprised of 3 Kafka replicas and 3 Zookeeper replicas. They
+expose themselves using 3 LoadBalancer services. To deploy the cluster, simply run
+
+ sh setup-cluster.sh
+
+Before executing the script, ensure that your account has cluster-admin
+privileges, as setting RBAC cluster roles requires that.
+
+The scripts are based on [Yolean kubernetes-kafka](https://github.com/Yolean/kubernetes-kafka)
diff --git a/.test-infra/kubernetes/kafka-cluster/setup-cluster.sh b/.test-infra/kubernetes/kafka-cluster/setup-cluster.sh
new file mode 100755
index 0000000..65c021a
--- /dev/null
+++ b/.test-infra/kubernetes/kafka-cluster/setup-cluster.sh
@@ -0,0 +1,18 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+kubectl apply -R -f .
diff --git a/build.gradle b/build.gradle
index e34eaa2..c0f9091 100644
--- a/build.gradle
+++ b/build.gradle
@@ -26,7 +26,7 @@
// See https://github.com/ben-manes/gradle-versions-plugin for further details.
id 'com.github.ben-manes.versions' version '0.17.0'
// Apply one top level rat plugin to perform any required license enforcement analysis
- id 'org.nosphere.apache.rat' version '0.3.1'
+ id 'org.nosphere.apache.rat' version '0.4.0'
// Enable gradle-based release management
id 'net.researchgate.release' version '2.6.0'
id 'org.apache.beam.module'
@@ -113,12 +113,6 @@
exclusions.addAll(gitIgnoreExcludes)
}
- // Combining verbose with only XML output has each failing license logged.
- // See https://github.com/eskatos/creadur-rat-gradle/issues/8 for further details.
- verbose = true
- plainOutput = false
- xmlOutput = true
- htmlOutput = false
failOnError = true
excludes = exclusions
}
diff --git a/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy b/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
index 3a22c72..918df02 100644
--- a/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
+++ b/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
@@ -454,6 +454,7 @@
hamcrest_core : "org.hamcrest:hamcrest-core:$hamcrest_version",
hamcrest_library : "org.hamcrest:hamcrest-library:$hamcrest_version",
jackson_annotations : "com.fasterxml.jackson.core:jackson-annotations:$jackson_version",
+ jackson_jaxb_annotations : "com.fasterxml.jackson.module:jackson-module-jaxb-annotations:$jackson_version",
jackson_core : "com.fasterxml.jackson.core:jackson-core:$jackson_version",
jackson_databind : "com.fasterxml.jackson.core:jackson-databind:$jackson_version",
jackson_dataformat_cbor : "com.fasterxml.jackson.dataformat:jackson-dataformat-cbor:$jackson_version",
@@ -461,7 +462,7 @@
jackson_datatype_joda : "com.fasterxml.jackson.datatype:jackson-datatype-joda:$jackson_version",
jackson_module_scala : "com.fasterxml.jackson.module:jackson-module-scala_2.11:$jackson_version",
jaxb_api : "javax.xml.bind:jaxb-api:$jaxb_api_version",
- joda_time : "joda-time:joda-time:2.4",
+ joda_time : "joda-time:joda-time:2.10.1",
junit : "junit:junit:4.13-beta-1",
kafka_2_11 : "org.apache.kafka:kafka_2.11:$kafka_version",
kafka_clients : "org.apache.kafka:kafka-clients:$kafka_version",
@@ -755,7 +756,10 @@
project.checkstyle { toolVersion = "8.7" }
// Configures javadoc plugin and ensure check runs javadoc.
- project.tasks.withType(Javadoc) { options.encoding = 'UTF-8' }
+ project.tasks.withType(Javadoc) {
+ options.encoding = 'UTF-8'
+ options.addBooleanOption('Xdoclint:-missing', true)
+ }
project.check.dependsOn project.javadoc
// Apply the eclipse and apt-eclipse plugins. This adds the "eclipse" task and
@@ -1593,19 +1597,20 @@
project.ext.envdir = "${project.rootProject.buildDir}/gradleenv/${project.name.hashCode()}"
def pythonRootDir = "${project.rootDir}/sdks/python"
- // This is current supported Python3 version. It should match the one in
- // sdks/python/container/py3/Dockerfile
- final PYTHON3_VERSION = '3.5'
+ // Python interpreter version for virtualenv setup and test run. This value can be
+ // set from commandline with -PpythonVersion, or in build script of certain project.
+ // If none of them applied, version set here will be used as default value.
+ if(!project.hasProperty('pythonVersion')) {
+ project.ext.pythonVersion = '2.7'
+ }
project.task('setupVirtualenv') {
doLast {
def virtualenvCmd = [
'virtualenv',
"${project.ext.envdir}",
+ "--python=python${project.ext.pythonVersion}",
]
- if (project.hasProperty('python3')) {
- virtualenvCmd += '--python=python' + PYTHON3_VERSION
- }
project.exec { commandLine virtualenvCmd }
project.exec {
executable 'sh'
diff --git a/examples/java/src/main/java/org/apache/beam/examples/common/ExampleUtils.java b/examples/java/src/main/java/org/apache/beam/examples/common/ExampleUtils.java
index 14a11b7..181d190 100644
--- a/examples/java/src/main/java/org/apache/beam/examples/common/ExampleUtils.java
+++ b/examples/java/src/main/java/org/apache/beam/examples/common/ExampleUtils.java
@@ -41,15 +41,15 @@
import java.util.concurrent.TimeUnit;
import org.apache.beam.sdk.PipelineResult;
import org.apache.beam.sdk.extensions.gcp.auth.NullCredentialInitializer;
+import org.apache.beam.sdk.extensions.gcp.util.RetryHttpRequestInitializer;
+import org.apache.beam.sdk.extensions.gcp.util.Transport;
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryOptions;
import org.apache.beam.sdk.io.gcp.pubsub.PubsubOptions;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.util.BackOff;
import org.apache.beam.sdk.util.BackOffUtils;
import org.apache.beam.sdk.util.FluentBackoff;
-import org.apache.beam.sdk.util.RetryHttpRequestInitializer;
import org.apache.beam.sdk.util.Sleeper;
-import org.apache.beam.sdk.util.Transport;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableList;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Lists;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Sets;
diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/TfIdf.java b/examples/java/src/main/java/org/apache/beam/examples/complete/TfIdf.java
index ea22fbe..0b0023e 100644
--- a/examples/java/src/main/java/org/apache/beam/examples/complete/TfIdf.java
+++ b/examples/java/src/main/java/org/apache/beam/examples/complete/TfIdf.java
@@ -28,6 +28,8 @@
import org.apache.beam.sdk.coders.StringDelegateCoder;
import org.apache.beam.sdk.coders.StringUtf8Coder;
import org.apache.beam.sdk.extensions.gcp.options.GcsOptions;
+import org.apache.beam.sdk.extensions.gcp.util.GcsUtil;
+import org.apache.beam.sdk.extensions.gcp.util.gcsfs.GcsPath;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.options.Default;
import org.apache.beam.sdk.options.Description;
@@ -47,8 +49,6 @@
import org.apache.beam.sdk.transforms.join.CoGbkResult;
import org.apache.beam.sdk.transforms.join.CoGroupByKey;
import org.apache.beam.sdk.transforms.join.KeyedPCollectionTuple;
-import org.apache.beam.sdk.util.GcsUtil;
-import org.apache.beam.sdk.util.gcsfs.GcsPath;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.values.PBegin;
import org.apache.beam.sdk.values.PCollection;
diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/TopWikipediaSessions.java b/examples/java/src/main/java/org/apache/beam/examples/complete/TopWikipediaSessions.java
index e4aaaa5..faa1b42 100644
--- a/examples/java/src/main/java/org/apache/beam/examples/complete/TopWikipediaSessions.java
+++ b/examples/java/src/main/java/org/apache/beam/examples/complete/TopWikipediaSessions.java
@@ -22,6 +22,7 @@
import java.math.BigDecimal;
import java.util.List;
import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.extensions.gcp.util.Transport;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.options.Default;
import org.apache.beam.sdk.options.Description;
@@ -41,7 +42,6 @@
import org.apache.beam.sdk.transforms.windowing.IntervalWindow;
import org.apache.beam.sdk.transforms.windowing.Sessions;
import org.apache.beam.sdk.transforms.windowing.Window;
-import org.apache.beam.sdk.util.Transport;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ComparisonChain;
diff --git a/examples/java/src/main/java/org/apache/beam/examples/cookbook/DistinctExample.java b/examples/java/src/main/java/org/apache/beam/examples/cookbook/DistinctExample.java
index 0f6a52f..c6f0a23 100644
--- a/examples/java/src/main/java/org/apache/beam/examples/cookbook/DistinctExample.java
+++ b/examples/java/src/main/java/org/apache/beam/examples/cookbook/DistinctExample.java
@@ -18,6 +18,7 @@
package org.apache.beam.examples.cookbook;
import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.extensions.gcp.util.gcsfs.GcsPath;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.options.Default;
import org.apache.beam.sdk.options.DefaultValueFactory;
@@ -25,7 +26,6 @@
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.transforms.Distinct;
-import org.apache.beam.sdk.util.gcsfs.GcsPath;
/**
* This example uses as input Shakespeare's plays as plaintext files, and will remove any duplicate
diff --git a/examples/java/src/test/java/org/apache/beam/examples/MinimalWordCountTest.java b/examples/java/src/test/java/org/apache/beam/examples/MinimalWordCountTest.java
index 6cb2c98..d7039d3 100644
--- a/examples/java/src/test/java/org/apache/beam/examples/MinimalWordCountTest.java
+++ b/examples/java/src/test/java/org/apache/beam/examples/MinimalWordCountTest.java
@@ -24,14 +24,14 @@
import java.nio.file.StandardOpenOption;
import java.util.Arrays;
import org.apache.beam.sdk.extensions.gcp.options.GcsOptions;
+import org.apache.beam.sdk.extensions.gcp.util.GcsUtil;
+import org.apache.beam.sdk.extensions.gcp.util.gcsfs.GcsPath;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.testing.TestPipeline;
import org.apache.beam.sdk.transforms.Count;
import org.apache.beam.sdk.transforms.Filter;
import org.apache.beam.sdk.transforms.FlatMapElements;
import org.apache.beam.sdk.transforms.MapElements;
-import org.apache.beam.sdk.util.GcsUtil;
-import org.apache.beam.sdk.util.gcsfs.GcsPath;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.values.TypeDescriptors;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableList;
diff --git a/examples/java/src/test/java/org/apache/beam/examples/subprocess/ExampleEchoPipelineTest.java b/examples/java/src/test/java/org/apache/beam/examples/subprocess/ExampleEchoPipelineTest.java
index bf11558..ec51801 100644
--- a/examples/java/src/test/java/org/apache/beam/examples/subprocess/ExampleEchoPipelineTest.java
+++ b/examples/java/src/test/java/org/apache/beam/examples/subprocess/ExampleEchoPipelineTest.java
@@ -34,14 +34,14 @@
import org.apache.beam.examples.subprocess.kernel.SubProcessKernel;
import org.apache.beam.examples.subprocess.utils.CallingSubProcessUtils;
import org.apache.beam.sdk.extensions.gcp.options.GcsOptions;
+import org.apache.beam.sdk.extensions.gcp.util.GcsUtil;
+import org.apache.beam.sdk.extensions.gcp.util.gcsfs.GcsPath;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.testing.PAssert;
import org.apache.beam.sdk.testing.TestPipeline;
import org.apache.beam.sdk.transforms.Create;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.ParDo;
-import org.apache.beam.sdk.util.GcsUtil;
-import org.apache.beam.sdk.util.gcsfs.GcsPath;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableList;
diff --git a/model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml b/model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml
index f7ede2b..2629cb5 100644
--- a/model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml
+++ b/model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml
@@ -207,3 +207,19 @@
pane: {is_first: True, is_last: True, timing: UNKNOWN, index: 0, on_time_index: 0},
windows: [{end: 1454293425000, span: 3600000}, {end: -9223372036854410, span: 365}]
}
+
+---
+
+coder:
+ urn: "beam:coder:double:v1"
+examples:
+ "\0\0\0\0\0\0\0\0": "0"
+ "\u0080\0\0\0\0\0\0\0": "-0"
+ "\u003f\u00b9\u0099\u0099\u0099\u0099\u0099\u009a": "0.1"
+ "\u00bf\u00b9\u0099\u0099\u0099\u0099\u0099\u009a": "-0.1"
+ "\0\0\0\0\0\0\0\u0001": "4.9e-324"
+ "\0\u0001\0\0\0\0\0\0": "1.390671161567e-309"
+ "\u007f\u00ef\u00ff\u00ff\u00ff\u00ff\u00ff\u00ff": "1.7976931348623157e308"
+ "\u007f\u00f0\0\0\0\0\0\0": "Infinity"
+ "\u00ff\u00f0\0\0\0\0\0\0": "-Infinity"
+ "\u007f\u00f8\0\0\0\0\0\0": "NaN"
diff --git a/model/pipeline/src/main/proto/beam_runner_api.proto b/model/pipeline/src/main/proto/beam_runner_api.proto
index fa2461f..28891df 100644
--- a/model/pipeline/src/main/proto/beam_runner_api.proto
+++ b/model/pipeline/src/main/proto/beam_runner_api.proto
@@ -303,6 +303,21 @@
// and restrictions.
// Input: KV(element, restriction); output: DoFn's output.
PROCESS_ELEMENTS = 3 [(beam_urn) = "beam:transform:sdf_process_elements:v1"];
+
+ // Splits the restriction of each element/restriction pair and returns the
+ // resulting splits, with a corresponding floating point size estimations
+ // for each.
+ // A reasonable value for size is the number of bytes expected to be
+ // produced by this (element, restriction) pair.
+ // Input: KV(element, restriction)
+ // Output: KV(KV(element, restriction), size))
+ SPLIT_AND_SIZE_RESTRICTIONS = 4 [(beam_urn) = "beam:transform:sdf_split_and_size_restrictions:v1"];
+
+ // Like PROCESS_ELEMENTS, but accepts the sized output produced by
+ // SPLIT_RESTRICTION_WITH_SIZING.
+ // Input: KV(KV(element, restriction), size); output: DoFn's output.
+ PROCESS_SIZED_ELEMENTS_AND_RESTRICTIONS = 5 [(beam_urn) = "beam:transform:sdf_process_sized_element_and_restrictions:v1"];
+
}
}
@@ -635,6 +650,61 @@
}
}
+// Experimental: A representation of a Beam Schema.
+message Schema {
+ enum TypeName {
+ BYTE = 0;
+ INT16 = 1;
+ INT32 = 2;
+ INT64 = 3;
+ DECIMAL = 4;
+ FLOAT = 5;
+ DOUBLE = 6;
+ STRING = 7;
+ DATETIME = 8;
+ BOOLEAN = 9;
+ BYTES = 10;
+ ARRAY = 11;
+ MAP = 13;
+ ROW = 14;
+ LOGICAL_TYPE = 15;
+ }
+
+ message LogicalType {
+ string id = 1;
+ string args = 2;
+ FieldType base_type = 3;
+ bytes serialized_class = 4;
+ }
+
+ message MapType {
+ FieldType key_type = 1;
+ FieldType value_type = 2;
+ }
+
+ message FieldType {
+ TypeName type_name = 1;
+ bool nullable = 2;
+ oneof type_info {
+ FieldType collection_element_type = 3;
+ MapType map_type = 4;
+ Schema row_schema = 5;
+ LogicalType logical_type = 6;
+ }
+ }
+
+ message Field {
+ string name = 1;
+ string description = 2;
+ FieldType type = 3;
+ int32 id = 4;
+ int32 encoding_position = 5;
+ }
+
+ repeated Field fields = 1;
+ string id = 2;
+}
+
// A windowing strategy describes the window function, triggering, allowed
// lateness, and accumulation mode for a PCollection.
//
diff --git a/model/pipeline/src/main/proto/metrics.proto b/model/pipeline/src/main/proto/metrics.proto
index 15f23b3..43ec994 100644
--- a/model/pipeline/src/main/proto/metrics.proto
+++ b/model/pipeline/src/main/proto/metrics.proto
@@ -187,37 +187,6 @@
google.protobuf.Timestamp timestamp = 6;
}
-message MonitoringInfoUrns {
- enum Enum {
- // User counter have this format: 'beam:metric:user:<namespace>:<name>'.
- USER_COUNTER_URN_PREFIX = 0
- [(org.apache.beam.model.pipeline.v1.beam_urn) = "beam:metric:user:"];
-
- ELEMENT_COUNT = 1 [(org.apache.beam.model.pipeline.v1.beam_urn) =
- "beam:metric:element_count:v1"];
-
- START_BUNDLE_MSECS = 2
- [(org.apache.beam.model.pipeline.v1.beam_urn) =
- "beam:metric:pardo_execution_time:start_bundle_msecs:v1"];
-
- PROCESS_BUNDLE_MSECS = 3
- [(org.apache.beam.model.pipeline.v1.beam_urn) =
- "beam:metric:pardo_execution_time:process_bundle_msecs:v1"];
-
- FINISH_BUNDLE_MSECS = 4
- [(org.apache.beam.model.pipeline.v1.beam_urn) =
- "beam:metric:pardo_execution_time:finish_bundle_msecs:v1"];
-
- TOTAL_MSECS = 5
- [(org.apache.beam.model.pipeline.v1.beam_urn) =
- "beam:metric:ptransform_execution_time:total_msecs:v1"];
-
- USER_DISTRIBUTION_COUNTER_URN_PREFIX = 6
- [(org.apache.beam.model.pipeline.v1.beam_urn) =
- "beam:metric:user_distribution:"];
- }
-}
-
message MonitoringInfoTypeUrns {
enum Enum {
SUM_INT64_TYPE = 0 [(org.apache.beam.model.pipeline.v1.beam_urn) =
diff --git a/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/ParDoTranslator.java b/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/ParDoTranslator.java
index 35117f7..9f20718 100644
--- a/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/ParDoTranslator.java
+++ b/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/ParDoTranslator.java
@@ -37,7 +37,6 @@
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.transforms.reflect.DoFnSignature;
import org.apache.beam.sdk.transforms.reflect.DoFnSignatures;
-import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.values.PCollectionView;
import org.apache.beam.sdk.values.PValue;
@@ -130,13 +129,12 @@
}
}
- static class SplittableProcessElementsTranslator<
- InputT, OutputT, RestrictionT, TrackerT extends RestrictionTracker<RestrictionT, ?>>
- implements TransformTranslator<ProcessElements<InputT, OutputT, RestrictionT, TrackerT>> {
+ static class SplittableProcessElementsTranslator<InputT, OutputT, RestrictionT, PositionT>
+ implements TransformTranslator<ProcessElements<InputT, OutputT, RestrictionT, PositionT>> {
@Override
public void translate(
- ProcessElements<InputT, OutputT, RestrictionT, TrackerT> transform,
+ ProcessElements<InputT, OutputT, RestrictionT, PositionT> transform,
TranslationContext context) {
Map<TupleTag<?>, PValue> outputs = context.getOutputs();
diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ModelCoderRegistrar.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ModelCoderRegistrar.java
index 8843125..8e5d30c 100644
--- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ModelCoderRegistrar.java
+++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ModelCoderRegistrar.java
@@ -24,6 +24,7 @@
import java.util.Set;
import org.apache.beam.sdk.coders.ByteArrayCoder;
import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.coders.DoubleCoder;
import org.apache.beam.sdk.coders.IterableCoder;
import org.apache.beam.sdk.coders.KvCoder;
import org.apache.beam.sdk.coders.LengthPrefixCoder;
@@ -54,6 +55,7 @@
.put(LengthPrefixCoder.class, ModelCoders.LENGTH_PREFIX_CODER_URN)
.put(GlobalWindow.Coder.class, ModelCoders.GLOBAL_WINDOW_CODER_URN)
.put(FullWindowedValueCoder.class, ModelCoders.WINDOWED_VALUE_CODER_URN)
+ .put(DoubleCoder.class, ModelCoders.DOUBLE_CODER_URN)
.build();
public static final Set<String> WELL_KNOWN_CODER_URNS = BEAM_MODEL_CODER_URNS.values();
@@ -70,6 +72,7 @@
.put(Timer.Coder.class, CoderTranslators.timer())
.put(LengthPrefixCoder.class, CoderTranslators.lengthPrefix())
.put(FullWindowedValueCoder.class, CoderTranslators.fullWindowedValue())
+ .put(DoubleCoder.class, CoderTranslators.atomic(DoubleCoder.class))
.build();
static {
diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ModelCoders.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ModelCoders.java
index 3c6dfba..720bd6c 100644
--- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ModelCoders.java
+++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ModelCoders.java
@@ -37,6 +37,8 @@
// coders?
public static final String INT64_CODER_URN = getUrn(StandardCoders.Enum.VARINT);
+ public static final String DOUBLE_CODER_URN = getUrn(StandardCoders.Enum.DOUBLE);
+
public static final String ITERABLE_CODER_URN = getUrn(StandardCoders.Enum.ITERABLE);
public static final String TIMER_CODER_URN = getUrn(StandardCoders.Enum.TIMER);
public static final String KV_CODER_URN = getUrn(StandardCoders.Enum.KV);
@@ -61,7 +63,8 @@
LENGTH_PREFIX_CODER_URN,
GLOBAL_WINDOW_CODER_URN,
INTERVAL_WINDOW_CODER_URN,
- WINDOWED_VALUE_CODER_URN);
+ WINDOWED_VALUE_CODER_URN,
+ DOUBLE_CODER_URN);
public static Set<String> urns() {
return MODEL_CODER_URNS;
diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SchemaTranslation.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SchemaTranslation.java
new file mode 100644
index 0000000..2ad7335
--- /dev/null
+++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SchemaTranslation.java
@@ -0,0 +1,170 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction;
+
+import java.util.Map;
+import java.util.UUID;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.schemas.Schema.LogicalType;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
+import org.apache.beam.sdk.util.SerializableUtils;
+import org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.ByteString;
+import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.BiMap;
+import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableBiMap;
+import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Maps;
+
+/** Utility methods for translating schemas. */
+public class SchemaTranslation {
+ private static final BiMap<TypeName, RunnerApi.Schema.TypeName> TYPE_NAME_MAPPING =
+ ImmutableBiMap.<TypeName, RunnerApi.Schema.TypeName>builder()
+ .put(TypeName.BYTE, RunnerApi.Schema.TypeName.BYTE)
+ .put(TypeName.INT16, RunnerApi.Schema.TypeName.INT16)
+ .put(TypeName.INT32, RunnerApi.Schema.TypeName.INT32)
+ .put(TypeName.INT64, RunnerApi.Schema.TypeName.INT64)
+ .put(TypeName.DECIMAL, RunnerApi.Schema.TypeName.DECIMAL)
+ .put(TypeName.FLOAT, RunnerApi.Schema.TypeName.FLOAT)
+ .put(TypeName.DOUBLE, RunnerApi.Schema.TypeName.DOUBLE)
+ .put(TypeName.STRING, RunnerApi.Schema.TypeName.STRING)
+ .put(TypeName.DATETIME, RunnerApi.Schema.TypeName.DATETIME)
+ .put(TypeName.BOOLEAN, RunnerApi.Schema.TypeName.BOOLEAN)
+ .put(TypeName.BYTES, RunnerApi.Schema.TypeName.BYTES)
+ .put(TypeName.ARRAY, RunnerApi.Schema.TypeName.ARRAY)
+ .put(TypeName.MAP, RunnerApi.Schema.TypeName.MAP)
+ .put(TypeName.ROW, RunnerApi.Schema.TypeName.ROW)
+ .put(TypeName.LOGICAL_TYPE, RunnerApi.Schema.TypeName.LOGICAL_TYPE)
+ .build();
+
+ public static RunnerApi.Schema toProto(Schema schema) {
+ RunnerApi.Schema.Builder builder =
+ RunnerApi.Schema.newBuilder().setId(schema.getUUID().toString());
+ for (Field field : schema.getFields()) {
+ RunnerApi.Schema.Field protoField =
+ toProto(
+ field,
+ schema.indexOf(field.getName()),
+ schema.getEncodingPositions().get(field.getName()));
+ builder.addFields(protoField);
+ }
+ return builder.build();
+ }
+
+ private static RunnerApi.Schema.Field toProto(Field field, int fieldId, int position) {
+ return RunnerApi.Schema.Field.newBuilder()
+ .setName(field.getName())
+ .setDescription(field.getDescription())
+ .setType(toProto(field.getType()))
+ .setId(fieldId)
+ .setEncodingPosition(position)
+ .build();
+ }
+
+ private static RunnerApi.Schema.FieldType toProto(FieldType fieldType) {
+ RunnerApi.Schema.FieldType.Builder builder =
+ RunnerApi.Schema.FieldType.newBuilder()
+ .setTypeName(TYPE_NAME_MAPPING.get(fieldType.getTypeName()));
+ switch (fieldType.getTypeName()) {
+ case ROW:
+ builder.setRowSchema(toProto(fieldType.getRowSchema()));
+ break;
+
+ case ARRAY:
+ builder.setCollectionElementType(toProto(fieldType.getCollectionElementType()));
+ break;
+
+ case MAP:
+ builder.setMapType(
+ RunnerApi.Schema.MapType.newBuilder()
+ .setKeyType(toProto(fieldType.getMapKeyType()))
+ .setValueType(toProto(fieldType.getMapValueType()))
+ .build());
+ break;
+
+ case LOGICAL_TYPE:
+ LogicalType logicalType = fieldType.getLogicalType();
+ builder.setLogicalType(
+ RunnerApi.Schema.LogicalType.newBuilder()
+ .setId(logicalType.getIdentifier())
+ .setArgs(logicalType.getArgument())
+ .setBaseType(toProto(logicalType.getBaseType()))
+ .setSerializedClass(
+ ByteString.copyFrom(SerializableUtils.serializeToByteArray(logicalType)))
+ .build());
+ break;
+
+ default:
+ break;
+ }
+ return builder.build();
+ }
+
+ public static Schema fromProto(RunnerApi.Schema protoSchema) {
+ Schema.Builder builder = Schema.builder();
+ Map<String, Integer> encodingLocationMap = Maps.newHashMap();
+ for (RunnerApi.Schema.Field protoField : protoSchema.getFieldsList()) {
+ Field field = fieldFromProto(protoField);
+ builder.addField(field);
+ encodingLocationMap.put(protoField.getName(), protoField.getEncodingPosition());
+ }
+ Schema schema = builder.build();
+ schema.setEncodingPositions(encodingLocationMap);
+ schema.setUUID(UUID.fromString(protoSchema.getId()));
+
+ return schema;
+ }
+
+ private static Field fieldFromProto(RunnerApi.Schema.Field protoField) {
+ return Field.of(protoField.getName(), fieldTypeFromProto(protoField.getType()))
+ .withDescription(protoField.getDescription());
+ }
+
+ private static FieldType fieldTypeFromProto(RunnerApi.Schema.FieldType protoFieldType) {
+ TypeName typeName = TYPE_NAME_MAPPING.inverse().get(protoFieldType.getTypeName());
+ FieldType fieldType;
+ switch (typeName) {
+ case ROW:
+ fieldType = FieldType.row(fromProto(protoFieldType.getRowSchema()));
+ break;
+ case ARRAY:
+ fieldType = FieldType.array(fieldTypeFromProto(protoFieldType.getCollectionElementType()));
+ break;
+ case MAP:
+ fieldType =
+ FieldType.map(
+ fieldTypeFromProto(protoFieldType.getMapType().getKeyType()),
+ fieldTypeFromProto(protoFieldType.getMapType().getValueType()));
+ break;
+ case LOGICAL_TYPE:
+ LogicalType logicalType =
+ (LogicalType)
+ SerializableUtils.deserializeFromByteArray(
+ protoFieldType.getLogicalType().getSerializedClass().toByteArray(),
+ "logicalType");
+ fieldType = FieldType.logicalType(logicalType);
+ break;
+ default:
+ fieldType = FieldType.of(typeName);
+ }
+ if (protoFieldType.getNullable()) {
+ fieldType = fieldType.withNullable(true);
+ }
+ return fieldType;
+ }
+}
diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDoNaiveBounded.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDoNaiveBounded.java
index 4b3785d..35ba2ff 100644
--- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDoNaiveBounded.java
+++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDoNaiveBounded.java
@@ -111,8 +111,7 @@
}
}
- static class NaiveProcessFn<
- InputT, OutputT, RestrictionT, TrackerT extends RestrictionTracker<RestrictionT, ?>>
+ static class NaiveProcessFn<InputT, OutputT, RestrictionT, PositionT>
extends DoFn<KV<InputT, RestrictionT>, OutputT> {
private final DoFn<InputT, OutputT> fn;
@@ -144,7 +143,7 @@
InputT element = c.element().getKey();
RestrictionT restriction = c.element().getValue();
while (true) {
- TrackerT tracker = invoker.invokeNewTracker(restriction);
+ RestrictionTracker<RestrictionT, PositionT> tracker = invoker.invokeNewTracker(restriction);
ProcessContinuation continuation =
invoker.invokeProcessElement(new NestedProcessContext<>(fn, c, element, w, tracker));
if (continuation.shouldResume()) {
diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CoderTranslationTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CoderTranslationTest.java
index f96c977..0ec7588 100644
--- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CoderTranslationTest.java
+++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CoderTranslationTest.java
@@ -35,6 +35,7 @@
import org.apache.beam.sdk.coders.ByteArrayCoder;
import org.apache.beam.sdk.coders.Coder;
import org.apache.beam.sdk.coders.CoderException;
+import org.apache.beam.sdk.coders.DoubleCoder;
import org.apache.beam.sdk.coders.IterableCoder;
import org.apache.beam.sdk.coders.KvCoder;
import org.apache.beam.sdk.coders.LengthPrefixCoder;
@@ -70,6 +71,7 @@
.add(
FullWindowedValueCoder.of(
IterableCoder.of(VarLongCoder.of()), IntervalWindowCoder.of()))
+ .add(DoubleCoder.of())
.build();
/**
diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CommonCoderTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CommonCoderTest.java
index 080e7b7..3e129aa 100644
--- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CommonCoderTest.java
+++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CommonCoderTest.java
@@ -51,6 +51,7 @@
import org.apache.beam.sdk.coders.Coder;
import org.apache.beam.sdk.coders.Coder.Context;
import org.apache.beam.sdk.coders.CoderException;
+import org.apache.beam.sdk.coders.DoubleCoder;
import org.apache.beam.sdk.coders.IterableCoder;
import org.apache.beam.sdk.coders.KvCoder;
import org.apache.beam.sdk.coders.VarLongCoder;
@@ -90,6 +91,7 @@
.put(getUrn(StandardCoders.Enum.ITERABLE), IterableCoder.class)
.put(getUrn(StandardCoders.Enum.TIMER), Timer.Coder.class)
.put(getUrn(StandardCoders.Enum.GLOBAL_WINDOW), GlobalWindow.Coder.class)
+ .put(getUrn(StandardCoders.Enum.DOUBLE), DoubleCoder.class)
.put(
getUrn(StandardCoders.Enum.WINDOWED_VALUE),
WindowedValue.FullWindowedValueCoder.class)
@@ -270,6 +272,8 @@
(int) paneInfoMap.get("index"),
(int) paneInfoMap.get("on_time_index"));
return WindowedValue.of(windowValue, timestamp, windows, paneInfo);
+ } else if (s.equals(getUrn(StandardCoders.Enum.DOUBLE))) {
+ return Double.parseDouble((String) value);
} else {
throw new IllegalStateException("Unknown coder URN: " + coderSpec.getUrn());
}
@@ -298,6 +302,8 @@
} else if (s.equals(getUrn(StandardCoders.Enum.WINDOWED_VALUE))) {
return WindowedValue.FullWindowedValueCoder.of(
components.get(0), (Coder<BoundedWindow>) components.get(1));
+ } else if (s.equals(getUrn(StandardCoders.Enum.DOUBLE))) {
+ return DoubleCoder.of();
} else {
throw new IllegalStateException("Unknown coder URN: " + coder.getUrn());
}
@@ -357,6 +363,9 @@
} else if (s.equals(getUrn(StandardCoders.Enum.WINDOWED_VALUE))) {
assertEquals(expectedValue, actualValue);
+ } else if (s.equals(getUrn(StandardCoders.Enum.DOUBLE))) {
+
+ assertEquals(expectedValue, actualValue);
} else {
throw new IllegalStateException("Unknown coder URN: " + coder.getUrn());
}
diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PTransformMatchersTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PTransformMatchersTest.java
index 618a12e..7f4ebda 100644
--- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PTransformMatchersTest.java
+++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PTransformMatchersTest.java
@@ -163,7 +163,8 @@
private DoFn<KV<String, Integer>, Integer> splittableDoFn =
new DoFn<KV<String, Integer>, Integer>() {
@ProcessElement
- public void processElement(ProcessContext context, SomeTracker tracker) {}
+ public void processElement(
+ ProcessContext context, RestrictionTracker<Void, Void> tracker) {}
@GetInitialRestriction
public Void getInitialRestriction(KV<String, Integer> element) {
diff --git a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/SplittableParDoTest.java b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/SplittableParDoTest.java
index 68365c8..959120c 100644
--- a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/SplittableParDoTest.java
+++ b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/SplittableParDoTest.java
@@ -58,7 +58,7 @@
}
@Override
- protected boolean tryClaimImpl(Void position) {
+ public boolean tryClaim(Void position) {
return false;
}
@@ -78,7 +78,8 @@
private static class BoundedFakeFn extends DoFn<Integer, String> {
@ProcessElement
- public void processElement(ProcessContext context, SomeRestrictionTracker tracker) {}
+ public void processElement(
+ ProcessContext context, RestrictionTracker<SomeRestriction, Void> tracker) {}
@GetInitialRestriction
public SomeRestriction getInitialRestriction(Integer element) {
@@ -89,7 +90,7 @@
private static class UnboundedFakeFn extends DoFn<Integer, String> {
@ProcessElement
public ProcessContinuation processElement(
- ProcessContext context, SomeRestrictionTracker tracker) {
+ ProcessContext context, RestrictionTracker<SomeRestriction, Void> tracker) {
return stop();
}
diff --git a/runners/core-java/build.gradle b/runners/core-java/build.gradle
index 1ab97a3..87e81c9 100644
--- a/runners/core-java/build.gradle
+++ b/runners/core-java/build.gradle
@@ -35,6 +35,7 @@
shadow project(path: ":beam-sdks-java-core", configuration: "shadow")
shadow project(path: ":beam-model-fn-execution", configuration: "shadow")
shadow project(path: ":beam-runners-core-construction-java", configuration: "shadow")
+ shadow project(path: ":beam-sdks-java-fn-execution", configuration: "shadow")
shadow library.java.vendored_guava_20_0
shadow library.java.joda_time
shadowTest project(path: ":beam-sdks-java-core", configuration: "shadowTest")
diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/OutputAndTimeBoundedSplittableProcessElementInvoker.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/OutputAndTimeBoundedSplittableProcessElementInvoker.java
index f9a2b69..2009f90 100644
--- a/runners/core-java/src/main/java/org/apache/beam/runners/core/OutputAndTimeBoundedSplittableProcessElementInvoker.java
+++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/OutputAndTimeBoundedSplittableProcessElementInvoker.java
@@ -25,6 +25,7 @@
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;
import javax.annotation.Nullable;
+import org.apache.beam.sdk.fn.splittabledofn.RestrictionTrackers;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.state.State;
import org.apache.beam.sdk.state.TimeDomain;
@@ -55,12 +56,8 @@
* outputs), or runs for the given duration.
*/
public class OutputAndTimeBoundedSplittableProcessElementInvoker<
- InputT,
- OutputT,
- RestrictionT,
- PositionT,
- TrackerT extends RestrictionTracker<RestrictionT, PositionT>>
- extends SplittableProcessElementInvoker<InputT, OutputT, RestrictionT, TrackerT> {
+ InputT, OutputT, RestrictionT, PositionT>
+ extends SplittableProcessElementInvoker<InputT, OutputT, RestrictionT, PositionT> {
private final DoFn<InputT, OutputT> fn;
private final PipelineOptions pipelineOptions;
private final OutputWindowedValue<OutputT> output;
@@ -106,9 +103,9 @@
public Result invokeProcessElement(
DoFnInvoker<InputT, OutputT> invoker,
final WindowedValue<InputT> element,
- final TrackerT tracker) {
+ final RestrictionTracker<RestrictionT, PositionT> tracker) {
final ProcessContext processContext = new ProcessContext(element, tracker);
- tracker.setClaimObserver(processContext);
+
DoFn.ProcessContinuation cont =
invoker.invokeProcessElement(
new DoFnInvoker.ArgumentProvider<InputT, OutputT>() {
@@ -156,7 +153,7 @@
@Override
public RestrictionTracker<?, ?> restrictionTracker() {
- return tracker;
+ return processContext.tracker;
}
// Unsupported methods below.
@@ -226,7 +223,7 @@
// restriction that describes exactly the work that wasn't done in the current call.
if (processContext.numClaimedBlocks > 0) {
residual = checkNotNull(processContext.takeCheckpointNow());
- tracker.checkDone();
+ processContext.tracker.checkDone();
} else {
// The call returned resume() without trying to claim any blocks, i.e. it is unaware
// of any work to be done at the moment, but more might emerge later. This is a valid
@@ -254,14 +251,14 @@
// ProcessElement call.
// In other words, if we took a checkpoint *after* ProcessElement completed (like in the
// branch above), it would have been equivalent to this one.
- tracker.checkDone();
+ processContext.tracker.checkDone();
}
} else {
// The ProcessElement call returned stop() - that means the tracker's current restriction
// has been fully processed by the call. A checkpoint may or may not have been taken in
// "residual"; if it was, then we'll need to process it; if no, then we don't - nothing
// special needs to be done.
- tracker.checkDone();
+ processContext.tracker.checkDone();
}
if (residual == null) {
// Can only be true if cont.shouldResume() is false and no checkpoint was taken.
@@ -273,9 +270,9 @@
}
private class ProcessContext extends DoFn<InputT, OutputT>.ProcessContext
- implements RestrictionTracker.ClaimObserver<PositionT> {
+ implements RestrictionTrackers.ClaimObserver<PositionT> {
private final WindowedValue<InputT> element;
- private final TrackerT tracker;
+ private final RestrictionTracker<RestrictionT, PositionT> tracker;
private int numClaimedBlocks;
private boolean hasClaimFailed;
@@ -293,10 +290,11 @@
private @Nullable Future<?> scheduledCheckpoint;
private @Nullable Instant lastReportedWatermark;
- public ProcessContext(WindowedValue<InputT> element, TrackerT tracker) {
+ public ProcessContext(
+ WindowedValue<InputT> element, RestrictionTracker<RestrictionT, PositionT> tracker) {
fn.super();
this.element = element;
- this.tracker = tracker;
+ this.tracker = RestrictionTrackers.observe(tracker, this);
}
@Override
diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/SplittableParDoViaKeyedWorkItems.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/SplittableParDoViaKeyedWorkItems.java
index 2740bcc..6e30ab1 100644
--- a/runners/core-java/src/main/java/org/apache/beam/runners/core/SplittableParDoViaKeyedWorkItems.java
+++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/SplittableParDoViaKeyedWorkItems.java
@@ -155,8 +155,7 @@
}
/** A primitive transform wrapping around {@link ProcessFn}. */
- public static class ProcessElements<
- InputT, OutputT, RestrictionT, TrackerT extends RestrictionTracker<RestrictionT, ?>>
+ public static class ProcessElements<InputT, OutputT, RestrictionT, PositionT>
extends PTransform<
PCollection<KeyedWorkItem<byte[], KV<InputT, RestrictionT>>>, PCollectionTuple> {
private final ProcessKeyedElements<InputT, OutputT, RestrictionT> original;
@@ -165,7 +164,7 @@
this.original = original;
}
- public ProcessFn<InputT, OutputT, RestrictionT, TrackerT> newProcessFn(
+ public ProcessFn<InputT, OutputT, RestrictionT, PositionT> newProcessFn(
DoFn<InputT, OutputT> fn) {
return new ProcessFn<>(
fn,
@@ -216,8 +215,7 @@
* <p>See also: https://issues.apache.org/jira/browse/BEAM-1983
*/
@VisibleForTesting
- public static class ProcessFn<
- InputT, OutputT, RestrictionT, TrackerT extends RestrictionTracker<RestrictionT, ?>>
+ public static class ProcessFn<InputT, OutputT, RestrictionT, PositionT>
extends DoFn<KeyedWorkItem<byte[], KV<InputT, RestrictionT>>, OutputT> {
/**
* The state cell containing a watermark hold for the output of this {@link DoFn}. The hold is
@@ -254,7 +252,7 @@
private transient @Nullable StateInternalsFactory<byte[]> stateInternalsFactory;
private transient @Nullable TimerInternalsFactory<byte[]> timerInternalsFactory;
private transient @Nullable SplittableProcessElementInvoker<
- InputT, OutputT, RestrictionT, TrackerT>
+ InputT, OutputT, RestrictionT, PositionT>
processElementInvoker;
private transient @Nullable DoFnInvoker<InputT, OutputT> invoker;
@@ -285,7 +283,7 @@
}
public void setProcessElementInvoker(
- SplittableProcessElementInvoker<InputT, OutputT, RestrictionT, TrackerT> invoker) {
+ SplittableProcessElementInvoker<InputT, OutputT, RestrictionT, PositionT> invoker) {
this.processElementInvoker = invoker;
}
@@ -326,6 +324,13 @@
invoker.invokeFinishBundle(wrapContextAsFinishBundle(c));
}
+ /**
+ * Processes an element and restriction pair storing the restriction inside of state.
+ *
+ * <p>Uses a processing timer to resume execution if processing returns a continuation.
+ *
+ * <p>Uses a watermark hold to control watermark advancement.
+ */
@ProcessElement
public void processElement(final ProcessContext c) {
byte[] key = c.element().key();
@@ -370,8 +375,9 @@
elementAndRestriction = KV.of(elementState.read(), restrictionState.read());
}
- final TrackerT tracker = invoker.invokeNewTracker(elementAndRestriction.getValue());
- SplittableProcessElementInvoker<InputT, OutputT, RestrictionT, TrackerT>.Result result =
+ final RestrictionTracker<RestrictionT, PositionT> tracker =
+ invoker.invokeNewTracker(elementAndRestriction.getValue());
+ SplittableProcessElementInvoker<InputT, OutputT, RestrictionT, PositionT>.Result result =
processElementInvoker.invokeProcessElement(
invoker, elementAndRestriction.getKey(), tracker);
diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/SplittableProcessElementInvoker.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/SplittableProcessElementInvoker.java
index b3958e8..d877970 100644
--- a/runners/core-java/src/main/java/org/apache/beam/runners/core/SplittableProcessElementInvoker.java
+++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/SplittableProcessElementInvoker.java
@@ -31,8 +31,7 @@
* A runner-specific hook for invoking a {@link DoFn.ProcessElement} method for a splittable {@link
* DoFn}, in particular, allowing the runner to access the {@link RestrictionTracker}.
*/
-public abstract class SplittableProcessElementInvoker<
- InputT, OutputT, RestrictionT, TrackerT extends RestrictionTracker<RestrictionT, ?>> {
+public abstract class SplittableProcessElementInvoker<InputT, OutputT, RestrictionT, PositionT> {
/** Specifies how to resume a splittable {@link DoFn.ProcessElement} call. */
public class Result {
@Nullable private final RestrictionT residualRestriction;
@@ -84,5 +83,7 @@
* DoFn.ProcessContinuation}, and a future output watermark.
*/
public abstract Result invokeProcessElement(
- DoFnInvoker<InputT, OutputT> invoker, WindowedValue<InputT> element, TrackerT tracker);
+ DoFnInvoker<InputT, OutputT> invoker,
+ WindowedValue<InputT> element,
+ RestrictionTracker<RestrictionT, PositionT> tracker);
}
diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricUrns.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricUrns.java
index ba47812..4ed5fc4 100644
--- a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricUrns.java
+++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricUrns.java
@@ -17,21 +17,19 @@
*/
package org.apache.beam.runners.core.metrics;
-import static org.apache.beam.runners.core.metrics.SimpleMonitoringInfoBuilder.USER_COUNTER_URN_PREFIX;
-
import org.apache.beam.sdk.metrics.MetricName;
/** Utility for parsing a URN to a {@link MetricName}. */
public class MetricUrns {
/**
* Parse a {@link MetricName} from a {@link
- * org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfoUrns.Enum}.
+ * org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfoSpecs.Enum}.
*
* <p>Should be consistent with {@code parse_namespace_and_name} in monitoring_infos.py.
*/
public static MetricName parseUrn(String urn) {
- if (urn.startsWith(USER_COUNTER_URN_PREFIX)) {
- urn = urn.substring(USER_COUNTER_URN_PREFIX.length());
+ if (urn.startsWith(MonitoringInfoConstants.Urns.USER_COUNTER_PREFIX)) {
+ urn = urn.substring(MonitoringInfoConstants.Urns.USER_COUNTER_PREFIX.length());
}
// If it is not a user counter, just use the first part of the URN, i.e. 'beam'
String[] pieces = urn.split(":", 2);
diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoConstants.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoConstants.java
new file mode 100644
index 0000000..66695e9
--- /dev/null
+++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoConstants.java
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.metrics;
+
+import static org.apache.beam.model.pipeline.v1.MetricsApi.labelProps;
+import static org.apache.beam.model.pipeline.v1.MetricsApi.monitoringInfoSpec;
+
+import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo;
+import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo.MonitoringInfoLabels;
+import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfoSpecs;
+import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfoTypeUrns;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+
+/** This static class fetches MonitoringInfo related values from metrics.proto. */
+public final class MonitoringInfoConstants {
+
+ /** Supported MonitoringInfo Urns. */
+ public static final class Urns {
+ public static final String ELEMENT_COUNT = extractUrn(MonitoringInfoSpecs.Enum.ELEMENT_COUNT);
+ public static final String START_BUNDLE_MSECS =
+ extractUrn(MonitoringInfoSpecs.Enum.START_BUNDLE_MSECS);
+ public static final String PROCESS_BUNDLE_MSECS =
+ extractUrn(MonitoringInfoSpecs.Enum.PROCESS_BUNDLE_MSECS);
+ public static final String FINISH_BUNDLE_MSECS =
+ extractUrn(MonitoringInfoSpecs.Enum.FINISH_BUNDLE_MSECS);
+ public static final String TOTAL_MSECS = extractUrn(MonitoringInfoSpecs.Enum.TOTAL_MSECS);
+ public static final String USER_COUNTER_PREFIX =
+ extractUrn(MonitoringInfoSpecs.Enum.USER_COUNTER);
+ public static final String USER_DISTRIBUTION_COUNTER_PREFIX =
+ extractUrn(MonitoringInfoSpecs.Enum.USER_DISTRIBUTION_COUNTER);
+ }
+
+ /** Standardised MonitoringInfo labels that can be utilized by runners. */
+ public static final class Labels {
+ public static final String PTRANSFORM = extractLabel(MonitoringInfoLabels.TRANSFORM);
+ public static final String PCOLLECTION = extractLabel(MonitoringInfoLabels.PCOLLECTION);
+ public static final String WINDOWING_STRATEGY =
+ extractLabel(MonitoringInfoLabels.WINDOWING_STRATEGY);
+ public static final String CODER = extractLabel(MonitoringInfoLabels.CODER);
+ public static final String ENVIRONMENT = extractLabel(MonitoringInfoLabels.ENVIRONMENT);
+ }
+
+ /** MonitoringInfo type Urns. */
+ public static final class TypeUrns {
+ public static final String SUM_INT64 = extractLabel(MonitoringInfoTypeUrns.Enum.SUM_INT64_TYPE);
+ public static final String DISTRIBUTION_INT64 =
+ extractLabel(MonitoringInfoTypeUrns.Enum.DISTRIBUTION_INT64_TYPE);
+ public static final String LATEST_INT64 =
+ extractLabel(MonitoringInfoTypeUrns.Enum.LATEST_INT64_TYPE);
+ }
+
+ private static String extractUrn(MonitoringInfoSpecs.Enum value) {
+ return value.getValueDescriptor().getOptions().getExtension(monitoringInfoSpec).getUrn();
+ }
+
+ private static String extractLabel(MonitoringInfo.MonitoringInfoLabels value) {
+ return value.getValueDescriptor().getOptions().getExtension(labelProps).getName();
+ }
+
+ private static String extractLabel(MonitoringInfoTypeUrns.Enum value) {
+ return value.getValueDescriptor().getOptions().getExtension(RunnerApi.beamUrn);
+ }
+}
diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoMetricName.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoMetricName.java
index 4583c9d..f0a20e1 100644
--- a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoMetricName.java
+++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoMetricName.java
@@ -54,7 +54,7 @@
/** Parse the urn field into a name and namespace field. */
private void parseUrn() {
- if (this.urn.startsWith(SimpleMonitoringInfoBuilder.USER_COUNTER_URN_PREFIX)) {
+ if (this.urn.startsWith(MonitoringInfoConstants.Urns.USER_COUNTER_PREFIX)) {
List<String> split = new ArrayList<String>(Arrays.asList(this.getUrn().split(":")));
this.name = split.get(split.size() - 1);
this.namespace = split.get(split.size() - 2);
diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/SimpleExecutionState.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/SimpleExecutionState.java
index 9ae6741..135e5c3 100644
--- a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/SimpleExecutionState.java
+++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/SimpleExecutionState.java
@@ -90,7 +90,7 @@
public String getLullMessage(Thread trackedThread, Duration millis) {
// TODO(ajamato): Share getLullMessage code with DataflowExecutionState.
String userStepName =
- this.labelsMetadata.getOrDefault(SimpleMonitoringInfoBuilder.PTRANSFORM_LABEL, null);
+ this.labelsMetadata.getOrDefault(MonitoringInfoConstants.Labels.PTRANSFORM, null);
StringBuilder message = new StringBuilder();
message.append("Processing stuck");
if (userStepName != null) {
diff --git a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/SimpleMonitoringInfoBuilder.java b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/SimpleMonitoringInfoBuilder.java
index 92893d9..8ab8403 100644
--- a/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/SimpleMonitoringInfoBuilder.java
+++ b/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/SimpleMonitoringInfoBuilder.java
@@ -17,23 +17,15 @@
*/
package org.apache.beam.runners.core.metrics;
-import static org.apache.beam.model.pipeline.v1.MetricsApi.labelProps;
import static org.apache.beam.model.pipeline.v1.MetricsApi.monitoringInfoSpec;
import java.time.Instant;
import java.util.HashMap;
import javax.annotation.Nullable;
import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo;
-import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo.MonitoringInfoLabels;
-import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfoLabelProps;
import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfoSpec;
import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfoSpecs;
-import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfoTypeUrns;
-import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfoUrns;
-import org.apache.beam.runners.core.construction.BeamUrns;
import org.apache.beam.vendor.guava.v20_0.com.google.common.annotations.VisibleForTesting;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
/**
* Simplified building of MonitoringInfo fields, allows setting one field at a time with simpler
@@ -60,31 +52,12 @@
* builder.setInt64Value(1); MonitoringInfo mi = builder.build();
*/
public class SimpleMonitoringInfoBuilder {
- public static final String ELEMENT_COUNT_URN =
- BeamUrns.getUrn(MonitoringInfoUrns.Enum.ELEMENT_COUNT);
- public static final String START_BUNDLE_MSECS_URN =
- BeamUrns.getUrn(MonitoringInfoUrns.Enum.START_BUNDLE_MSECS);
- public static final String PROCESS_BUNDLE_MSECS_URN =
- BeamUrns.getUrn(MonitoringInfoUrns.Enum.PROCESS_BUNDLE_MSECS);
- public static final String FINISH_BUNDLE_MSECS_URN =
- BeamUrns.getUrn(MonitoringInfoUrns.Enum.FINISH_BUNDLE_MSECS);
- public static final String USER_COUNTER_URN_PREFIX =
- BeamUrns.getUrn(MonitoringInfoUrns.Enum.USER_COUNTER_URN_PREFIX);
- public static final String SUM_INT64_TYPE_URN =
- BeamUrns.getUrn(MonitoringInfoTypeUrns.Enum.SUM_INT64_TYPE);
+ private final boolean validateAndDropInvalid;
private static final HashMap<String, MonitoringInfoSpec> specs =
new HashMap<String, MonitoringInfoSpec>();
- public static final String PCOLLECTION_LABEL = getLabelString(MonitoringInfoLabels.PCOLLECTION);
- public static final String PTRANSFORM_LABEL = getLabelString(MonitoringInfoLabels.TRANSFORM);
-
- private final boolean validateAndDropInvalid;
-
- private static final Logger LOG = LoggerFactory.getLogger(SimpleMonitoringInfoBuilder.class);
-
private MonitoringInfo.Builder builder;
-
private SpecMonitoringInfoValidator validator = new SpecMonitoringInfoValidator();
static {
@@ -99,13 +72,6 @@
}
}
- /** Returns the label string constant defined in the MonitoringInfoLabel enum proto. */
- private static String getLabelString(MonitoringInfoLabels label) {
- MonitoringInfoLabelProps props =
- label.getValueDescriptor().getOptions().getExtension(labelProps);
- return props.getName();
- }
-
public SimpleMonitoringInfoBuilder() {
this(true);
}
@@ -120,7 +86,7 @@
String fixedMetricNamespace = metricNamespace.replace(':', '_');
String fixedMetricName = metricName.replace(':', '_');
StringBuilder sb = new StringBuilder();
- sb.append(USER_COUNTER_URN_PREFIX);
+ sb.append(MonitoringInfoConstants.Urns.USER_COUNTER_PREFIX);
sb.append(fixedMetricNamespace);
sb.append(':');
sb.append(fixedMetricName);
@@ -164,20 +130,20 @@
/** Sets the the appropriate type URN for sum int64 counters. */
public SimpleMonitoringInfoBuilder setInt64TypeUrn() {
- this.builder.setType(SUM_INT64_TYPE_URN);
+ this.builder.setType(MonitoringInfoConstants.TypeUrns.SUM_INT64);
return this;
}
/** Sets the PTRANSFORM MonitoringInfo label to the given param. */
public SimpleMonitoringInfoBuilder setPTransformLabel(String pTransform) {
// TODO(ajamato): Add validation that it is a valid pTransform name in the bundle descriptor.
- setLabel(PTRANSFORM_LABEL, pTransform);
+ setLabel(MonitoringInfoConstants.Labels.PTRANSFORM, pTransform);
return this;
}
/** Sets the PCOLLECTION MonitoringInfo label to the given param. */
public SimpleMonitoringInfoBuilder setPCollectionLabel(String pCollection) {
- setLabel(PCOLLECTION_LABEL, pCollection);
+ setLabel(MonitoringInfoConstants.Labels.PCOLLECTION, pCollection);
return this;
}
diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/OutputAndTimeBoundedSplittableProcessElementInvokerTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/OutputAndTimeBoundedSplittableProcessElementInvokerTest.java
index c54080c..a05aa8d 100644
--- a/runners/core-java/src/test/java/org/apache/beam/runners/core/OutputAndTimeBoundedSplittableProcessElementInvokerTest.java
+++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/OutputAndTimeBoundedSplittableProcessElementInvokerTest.java
@@ -35,6 +35,7 @@
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.reflect.DoFnInvokers;
import org.apache.beam.sdk.transforms.splittabledofn.OffsetRangeTracker;
+import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker;
import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
import org.apache.beam.sdk.transforms.windowing.GlobalWindow;
import org.apache.beam.sdk.transforms.windowing.PaneInfo;
@@ -66,7 +67,8 @@
}
@ProcessElement
- public ProcessContinuation process(ProcessContext context, OffsetRangeTracker tracker) {
+ public ProcessContinuation process(
+ ProcessContext context, RestrictionTracker<OffsetRange, Long> tracker) {
Uninterruptibles.sleepUninterruptibly(
sleepBeforeFirstClaim.getMillis(), TimeUnit.MILLISECONDS);
for (long i = tracker.currentRestriction().getFrom(), numIterations = 1;
@@ -88,20 +90,19 @@
}
}
- private SplittableProcessElementInvoker<Void, String, OffsetRange, OffsetRangeTracker>.Result
- runTest(
- int totalNumOutputs,
- Duration sleepBeforeFirstClaim,
- int numOutputsPerProcessCall,
- Duration sleepBeforeEachOutput) {
+ private SplittableProcessElementInvoker<Void, String, OffsetRange, Long>.Result runTest(
+ int totalNumOutputs,
+ Duration sleepBeforeFirstClaim,
+ int numOutputsPerProcessCall,
+ Duration sleepBeforeEachOutput) {
SomeFn fn = new SomeFn(sleepBeforeFirstClaim, numOutputsPerProcessCall, sleepBeforeEachOutput);
OffsetRange initialRestriction = new OffsetRange(0, totalNumOutputs);
return runTest(fn, initialRestriction);
}
- private SplittableProcessElementInvoker<Void, String, OffsetRange, OffsetRangeTracker>.Result
- runTest(DoFn<Void, String> fn, OffsetRange initialRestriction) {
- SplittableProcessElementInvoker<Void, String, OffsetRange, OffsetRangeTracker> invoker =
+ private SplittableProcessElementInvoker<Void, String, OffsetRange, Long>.Result runTest(
+ DoFn<Void, String> fn, OffsetRange initialRestriction) {
+ SplittableProcessElementInvoker<Void, String, OffsetRange, Long> invoker =
new OutputAndTimeBoundedSplittableProcessElementInvoker<>(
fn,
PipelineOptionsFactory.create(),
@@ -134,7 +135,7 @@
@Test
public void testInvokeProcessElementOutputBounded() throws Exception {
- SplittableProcessElementInvoker<Void, String, OffsetRange, OffsetRangeTracker>.Result res =
+ SplittableProcessElementInvoker<Void, String, OffsetRange, Long>.Result res =
runTest(10000, Duration.ZERO, Integer.MAX_VALUE, Duration.ZERO);
assertFalse(res.getContinuation().shouldResume());
OffsetRange residualRange = res.getResidualRestriction();
@@ -145,7 +146,7 @@
@Test
public void testInvokeProcessElementTimeBounded() throws Exception {
- SplittableProcessElementInvoker<Void, String, OffsetRange, OffsetRangeTracker>.Result res =
+ SplittableProcessElementInvoker<Void, String, OffsetRange, Long>.Result res =
runTest(10000, Duration.ZERO, Integer.MAX_VALUE, Duration.millis(100));
assertFalse(res.getContinuation().shouldResume());
OffsetRange residualRange = res.getResidualRestriction();
@@ -158,7 +159,7 @@
@Test
public void testInvokeProcessElementTimeBoundedWithStartupDelay() throws Exception {
- SplittableProcessElementInvoker<Void, String, OffsetRange, OffsetRangeTracker>.Result res =
+ SplittableProcessElementInvoker<Void, String, OffsetRange, Long>.Result res =
runTest(10000, Duration.standardSeconds(3), Integer.MAX_VALUE, Duration.millis(100));
assertFalse(res.getContinuation().shouldResume());
OffsetRange residualRange = res.getResidualRestriction();
@@ -170,7 +171,7 @@
@Test
public void testInvokeProcessElementVoluntaryReturnStop() throws Exception {
- SplittableProcessElementInvoker<Void, String, OffsetRange, OffsetRangeTracker>.Result res =
+ SplittableProcessElementInvoker<Void, String, OffsetRange, Long>.Result res =
runTest(5, Duration.ZERO, Integer.MAX_VALUE, Duration.millis(100));
assertFalse(res.getContinuation().shouldResume());
assertNull(res.getResidualRestriction());
@@ -178,7 +179,7 @@
@Test
public void testInvokeProcessElementVoluntaryReturnResume() throws Exception {
- SplittableProcessElementInvoker<Void, String, OffsetRange, OffsetRangeTracker>.Result res =
+ SplittableProcessElementInvoker<Void, String, OffsetRange, Long>.Result res =
runTest(10, Duration.ZERO, 5, Duration.millis(100));
assertTrue(res.getContinuation().shouldResume());
assertEquals(new OffsetRange(5, 10), res.getResidualRestriction());
@@ -189,7 +190,7 @@
DoFn<Void, String> brokenFn =
new DoFn<Void, String>() {
@ProcessElement
- public void process(ProcessContext c, OffsetRangeTracker tracker) {
+ public void process(ProcessContext c, RestrictionTracker<OffsetRange, Long> tracker) {
c.output("foo");
}
@@ -207,7 +208,7 @@
DoFn<Void, String> brokenFn =
new DoFn<Void, String>() {
@ProcessElement
- public void process(ProcessContext c, OffsetRangeTracker tracker) {
+ public void process(ProcessContext c, RestrictionTracker<OffsetRange, Long> tracker) {
assertFalse(tracker.tryClaim(6L));
c.output("foo");
}
diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/SplittableParDoProcessFnTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/SplittableParDoProcessFnTest.java
index 7702db3..ae383a6 100644
--- a/runners/core-java/src/test/java/org/apache/beam/runners/core/SplittableParDoProcessFnTest.java
+++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/SplittableParDoProcessFnTest.java
@@ -45,6 +45,7 @@
import org.apache.beam.sdk.coders.InstantCoder;
import org.apache.beam.sdk.coders.SerializableCoder;
import org.apache.beam.sdk.io.range.OffsetRange;
+import org.apache.beam.sdk.testing.ResetDateTimeProvider;
import org.apache.beam.sdk.testing.TestPipeline;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.DoFnTester;
@@ -75,6 +76,7 @@
public class SplittableParDoProcessFnTest {
private static final int MAX_OUTPUTS_PER_BUNDLE = 10000;
private static final Duration MAX_BUNDLE_DURATION = Duration.standardSeconds(5);
+ @Rule public final ResetDateTimeProvider dateTimeProvider = new ResetDateTimeProvider();
// ----------------- Tests for whether the transform sets boundedness correctly --------------
private static class SomeRestriction
@@ -93,7 +95,7 @@
}
@Override
- protected boolean tryClaimImpl(Void position) {
+ public boolean tryClaim(Void position) {
return true;
}
@@ -117,12 +119,7 @@
* A helper for testing {@link ProcessFn} on 1 element (but possibly over multiple {@link
* DoFn.ProcessElement} calls).
*/
- private static class ProcessFnTester<
- InputT,
- OutputT,
- RestrictionT,
- PositionT,
- TrackerT extends RestrictionTracker<RestrictionT, PositionT>>
+ private static class ProcessFnTester<InputT, OutputT, RestrictionT, PositionT>
implements AutoCloseable {
private final DoFnTester<KeyedWorkItem<byte[], KV<InputT, RestrictionT>>, OutputT> tester;
private Instant currentProcessingTime;
@@ -142,7 +139,7 @@
// encode IntervalWindow's because that's what all tests here use.
WindowingStrategy<InputT, BoundedWindow> windowingStrategy =
(WindowingStrategy) WindowingStrategy.of(FixedWindows.of(Duration.standardSeconds(1)));
- final ProcessFn<InputT, OutputT, RestrictionT, TrackerT> processFn =
+ final ProcessFn<InputT, OutputT, RestrictionT, PositionT> processFn =
new ProcessFn<>(fn, inputCoder, restrictionCoder, windowingStrategy);
this.tester = DoFnTester.of(processFn);
this.timerInternals = new InMemoryTimerInternals();
@@ -270,7 +267,7 @@
/** A simple splittable {@link DoFn} that's actually monolithic. */
private static class ToStringFn extends DoFn<Integer, String> {
@ProcessElement
- public void process(ProcessContext c, SomeRestrictionTracker tracker) {
+ public void process(ProcessContext c, RestrictionTracker<SomeRestriction, Void> tracker) {
checkState(tracker.tryClaim(null));
c.output(c.element().toString() + "a");
c.output(c.element().toString() + "b");
@@ -296,7 +293,7 @@
new IntervalWindow(
base.minus(Duration.standardMinutes(1)), base.plus(Duration.standardMinutes(1)));
- ProcessFnTester<Integer, String, SomeRestriction, Void, SomeRestrictionTracker> tester =
+ ProcessFnTester<Integer, String, SomeRestriction, Void> tester =
new ProcessFnTester<>(
base,
fn,
@@ -321,7 +318,7 @@
private static class WatermarkUpdateFn extends DoFn<Instant, String> {
@ProcessElement
- public void process(ProcessContext c, OffsetRangeTracker tracker) {
+ public void process(ProcessContext c, RestrictionTracker<OffsetRange, Long> tracker) {
for (long i = tracker.currentRestriction().getFrom(); tracker.tryClaim(i); ++i) {
c.updateWatermark(c.element().plus(Duration.standardSeconds(i)));
c.output(String.valueOf(i));
@@ -344,7 +341,7 @@
DoFn<Instant, String> fn = new WatermarkUpdateFn();
Instant base = Instant.now();
- ProcessFnTester<Instant, String, OffsetRange, Long, OffsetRangeTracker> tester =
+ ProcessFnTester<Instant, String, OffsetRange, Long> tester =
new ProcessFnTester<>(
base,
fn,
@@ -369,7 +366,8 @@
/** A simple splittable {@link DoFn} that outputs the given element every 5 seconds forever. */
private static class SelfInitiatedResumeFn extends DoFn<Integer, String> {
@ProcessElement
- public ProcessContinuation process(ProcessContext c, SomeRestrictionTracker tracker) {
+ public ProcessContinuation process(
+ ProcessContext c, RestrictionTracker<SomeRestriction, Void> tracker) {
checkState(tracker.tryClaim(null));
c.output(c.element().toString());
return resume().withResumeDelay(Duration.standardSeconds(5));
@@ -385,7 +383,8 @@
public void testResumeSetsTimer() throws Exception {
DoFn<Integer, String> fn = new SelfInitiatedResumeFn();
Instant base = Instant.now();
- ProcessFnTester<Integer, String, SomeRestriction, Void, SomeRestrictionTracker> tester =
+ dateTimeProvider.setDateTimeFixed(base.getMillis());
+ ProcessFnTester<Integer, String, SomeRestriction, Void> tester =
new ProcessFnTester<>(
base,
fn,
@@ -423,7 +422,8 @@
}
@ProcessElement
- public ProcessContinuation process(ProcessContext c, OffsetRangeTracker tracker) {
+ public ProcessContinuation process(
+ ProcessContext c, RestrictionTracker<OffsetRange, Long> tracker) {
for (long i = tracker.currentRestriction().getFrom(), numIterations = 0;
tracker.tryClaim(i);
++i, ++numIterations) {
@@ -445,7 +445,8 @@
public void testResumeCarriesOverState() throws Exception {
DoFn<Integer, String> fn = new CounterFn(1);
Instant base = Instant.now();
- ProcessFnTester<Integer, String, OffsetRange, Long, OffsetRangeTracker> tester =
+ dateTimeProvider.setDateTimeFixed(base.getMillis());
+ ProcessFnTester<Integer, String, OffsetRange, Long> tester =
new ProcessFnTester<>(
base,
fn,
@@ -474,7 +475,7 @@
Instant base = Instant.now();
int baseIndex = 42;
- ProcessFnTester<Integer, String, OffsetRange, Long, OffsetRangeTracker> tester =
+ ProcessFnTester<Integer, String, OffsetRange, Long> tester =
new ProcessFnTester<>(
base,
fn,
@@ -520,7 +521,7 @@
Instant base = Instant.now();
int baseIndex = 42;
- ProcessFnTester<Integer, String, OffsetRange, Long, OffsetRangeTracker> tester =
+ ProcessFnTester<Integer, String, OffsetRange, Long> tester =
new ProcessFnTester<>(
base,
fn,
@@ -552,7 +553,7 @@
private State state = State.BEFORE_SETUP;
@ProcessElement
- public void process(ProcessContext c, SomeRestrictionTracker tracker) {
+ public void process(ProcessContext c, RestrictionTracker<SomeRestriction, Void> tracker) {
assertEquals(State.INSIDE_BUNDLE, state);
}
@@ -589,7 +590,7 @@
@Test
public void testInvokesLifecycleMethods() throws Exception {
DoFn<Integer, String> fn = new LifecycleVerifyingFn();
- try (ProcessFnTester<Integer, String, SomeRestriction, Void, SomeRestrictionTracker> tester =
+ try (ProcessFnTester<Integer, String, SomeRestriction, Void> tester =
new ProcessFnTester<>(
Instant.now(),
fn,
diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/LabeledMetricsTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/LabeledMetricsTest.java
index c1704aa..0e6f692 100644
--- a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/LabeledMetricsTest.java
+++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/LabeledMetricsTest.java
@@ -41,7 +41,7 @@
MetricsEnvironment.setCurrentContainer(null);
assertNull(MetricsEnvironment.getCurrentContainer());
HashMap<String, String> labels = new HashMap<String, String>();
- String urn = SimpleMonitoringInfoBuilder.ELEMENT_COUNT_URN;
+ String urn = MonitoringInfoConstants.Urns.ELEMENT_COUNT;
MonitoringInfoMetricName name = MonitoringInfoMetricName.named(urn, labels);
Counter counter = LabeledMetrics.counter(name);
@@ -54,7 +54,7 @@
@Test
public void testOperationsUpdateCounterFromContainerWhenContainerIsPresent() {
HashMap<String, String> labels = new HashMap<String, String>();
- String urn = SimpleMonitoringInfoBuilder.ELEMENT_COUNT_URN;
+ String urn = MonitoringInfoConstants.Urns.ELEMENT_COUNT;
MonitoringInfoMetricName name = MonitoringInfoMetricName.named(urn, labels);
MetricsContainer mockContainer = Mockito.mock(MetricsContainer.class);
diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricsContainerImplTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricsContainerImplTest.java
index 4409827..28a846d 100644
--- a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricsContainerImplTest.java
+++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricsContainerImplTest.java
@@ -182,14 +182,14 @@
public void testMonitoringInfosArePopulatedForABeamCounter() {
MetricsContainerImpl testObject = new MetricsContainerImpl("step1");
HashMap<String, String> labels = new HashMap<String, String>();
- labels.put(SimpleMonitoringInfoBuilder.PCOLLECTION_LABEL, "pcollection");
+ labels.put(MonitoringInfoConstants.Labels.PCOLLECTION, "pcollection");
MetricName name =
- MonitoringInfoMetricName.named(SimpleMonitoringInfoBuilder.ELEMENT_COUNT_URN, labels);
+ MonitoringInfoMetricName.named(MonitoringInfoConstants.Urns.ELEMENT_COUNT, labels);
CounterCell c1 = testObject.getCounter(name);
c1.inc(2L);
SimpleMonitoringInfoBuilder builder1 = new SimpleMonitoringInfoBuilder();
- builder1.setUrn(SimpleMonitoringInfoBuilder.ELEMENT_COUNT_URN);
+ builder1.setUrn(MonitoringInfoConstants.Urns.ELEMENT_COUNT);
builder1.setPCollectionLabel("pcollection");
builder1.setInt64Value(2);
builder1.build();
diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MonitoringInfoMetricNameTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MonitoringInfoMetricNameTest.java
index 13dd79d..fa1b989 100644
--- a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MonitoringInfoMetricNameTest.java
+++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MonitoringInfoMetricNameTest.java
@@ -34,7 +34,7 @@
@Test
public void testElementCountConstruction() {
HashMap<String, String> labels = new HashMap<String, String>();
- String urn = SimpleMonitoringInfoBuilder.ELEMENT_COUNT_URN;
+ String urn = MonitoringInfoConstants.Urns.ELEMENT_COUNT;
MonitoringInfoMetricName name = MonitoringInfoMetricName.named(urn, labels);
assertEquals(null, name.getName());
assertEquals(null, name.getNamespace());
@@ -44,7 +44,7 @@
assertEquals(name, name); // test self equals;
// Reconstruct and test equality and hash code equivalence
- urn = SimpleMonitoringInfoBuilder.ELEMENT_COUNT_URN;
+ urn = MonitoringInfoConstants.Urns.ELEMENT_COUNT;
labels = new HashMap<String, String>();
MonitoringInfoMetricName name2 = MonitoringInfoMetricName.named(urn, labels);
@@ -76,11 +76,11 @@
@Test
public void testNotEqualsDiffLabels() {
HashMap<String, String> labels = new HashMap<String, String>();
- String urn = SimpleMonitoringInfoBuilder.ELEMENT_COUNT_URN;
+ String urn = MonitoringInfoConstants.Urns.ELEMENT_COUNT;
MonitoringInfoMetricName name = MonitoringInfoMetricName.named(urn, labels);
// Reconstruct and test equality and hash code equivalence
- urn = SimpleMonitoringInfoBuilder.ELEMENT_COUNT_URN;
+ urn = MonitoringInfoConstants.Urns.ELEMENT_COUNT;
labels = new HashMap<String, String>();
labels.put("label", "value1");
MonitoringInfoMetricName name2 = MonitoringInfoMetricName.named(urn, labels);
@@ -92,7 +92,7 @@
@Test
public void testNotEqualsDiffUrn() {
HashMap<String, String> labels = new HashMap<String, String>();
- String urn = SimpleMonitoringInfoBuilder.ELEMENT_COUNT_URN;
+ String urn = MonitoringInfoConstants.Urns.ELEMENT_COUNT;
MonitoringInfoMetricName name = MonitoringInfoMetricName.named(urn, labels);
// Reconstruct and test equality and hash code equivalence
@@ -108,7 +108,7 @@
public void testNullLabelsThrows() {
thrown.expect(IllegalArgumentException.class);
HashMap<String, String> labels = null;
- MonitoringInfoMetricName.named(SimpleMonitoringInfoBuilder.ELEMENT_COUNT_URN, labels);
+ MonitoringInfoMetricName.named(MonitoringInfoConstants.Urns.ELEMENT_COUNT, labels);
}
@Test
diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MonitoringInfoTestUtil.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MonitoringInfoTestUtil.java
index 1b337ee..3b33d42 100644
--- a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MonitoringInfoTestUtil.java
+++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MonitoringInfoTestUtil.java
@@ -27,16 +27,16 @@
/** @return A basic MonitoringInfoMetricName to test. */
public static MonitoringInfoMetricName testElementCountName() {
HashMap labels = new HashMap<String, String>();
- labels.put(SimpleMonitoringInfoBuilder.PCOLLECTION_LABEL, "testPCollection");
+ labels.put(MonitoringInfoConstants.Labels.PCOLLECTION, "testPCollection");
MonitoringInfoMetricName name =
- MonitoringInfoMetricName.named(SimpleMonitoringInfoBuilder.ELEMENT_COUNT_URN, labels);
+ MonitoringInfoMetricName.named(MonitoringInfoConstants.Urns.ELEMENT_COUNT, labels);
return name;
}
/** @return A basic MonitoringInfo which matches the testElementCountName. */
public static MonitoringInfo testElementCountMonitoringInfo(long value) {
SimpleMonitoringInfoBuilder builder = new SimpleMonitoringInfoBuilder();
- builder.setUrn(SimpleMonitoringInfoBuilder.ELEMENT_COUNT_URN);
+ builder.setUrn(MonitoringInfoConstants.Urns.ELEMENT_COUNT);
builder.setPCollectionLabel("testPCollection");
builder.setInt64Value(value);
return builder.build();
diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SimpleExecutionStateTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SimpleExecutionStateTest.java
index 2290619..2faf736 100644
--- a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SimpleExecutionStateTest.java
+++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SimpleExecutionStateTest.java
@@ -59,7 +59,7 @@
@Test
public void testGetLullReturnsARelevantMessageWithStepName() {
HashMap<String, String> labelsMetadata = new HashMap<String, String>();
- labelsMetadata.put(SimpleMonitoringInfoBuilder.PTRANSFORM_LABEL, "myPTransform");
+ labelsMetadata.put(MonitoringInfoConstants.Labels.PTRANSFORM, "myPTransform");
SimpleExecutionState testObject = new SimpleExecutionState("myState", null, labelsMetadata);
String message = testObject.getLullMessage(new Thread(), Duration.millis(100_000));
assertThat(message, containsString("myState"));
diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SimpleMonitoringInfoBuilderTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SimpleMonitoringInfoBuilderTest.java
index 18e7829..34f5fb4 100644
--- a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SimpleMonitoringInfoBuilderTest.java
+++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SimpleMonitoringInfoBuilderTest.java
@@ -33,7 +33,7 @@
@Test
public void testReturnsNullIfSpecRequirementsNotMet() {
SimpleMonitoringInfoBuilder builder = new SimpleMonitoringInfoBuilder();
- builder.setUrn(SimpleMonitoringInfoBuilder.ELEMENT_COUNT_URN);
+ builder.setUrn(MonitoringInfoConstants.Urns.ELEMENT_COUNT);
assertNull(builder.build());
builder.setInt64Value(1);
@@ -45,9 +45,9 @@
assertTrue(monitoringInfo != null);
assertEquals(
"myPcollection",
- monitoringInfo.getLabelsOrDefault(SimpleMonitoringInfoBuilder.PCOLLECTION_LABEL, null));
- assertEquals(SimpleMonitoringInfoBuilder.ELEMENT_COUNT_URN, monitoringInfo.getUrn());
- assertEquals(SimpleMonitoringInfoBuilder.SUM_INT64_TYPE_URN, monitoringInfo.getType());
+ monitoringInfo.getLabelsOrDefault(MonitoringInfoConstants.Labels.PCOLLECTION, null));
+ assertEquals(MonitoringInfoConstants.Urns.ELEMENT_COUNT, monitoringInfo.getUrn());
+ assertEquals(MonitoringInfoConstants.TypeUrns.SUM_INT64, monitoringInfo.getType());
assertEquals(1, monitoringInfo.getMetric().getCounterData().getInt64Value());
}
@@ -62,9 +62,9 @@
MonitoringInfo monitoringInfo = builder.build();
assertTrue(monitoringInfo != null);
assertEquals(
- SimpleMonitoringInfoBuilder.USER_COUNTER_URN_PREFIX + "myNamespace:myName",
+ MonitoringInfoConstants.Urns.USER_COUNTER_PREFIX + "myNamespace:myName",
monitoringInfo.getUrn());
- assertEquals(SimpleMonitoringInfoBuilder.SUM_INT64_TYPE_URN, monitoringInfo.getType());
+ assertEquals(MonitoringInfoConstants.TypeUrns.SUM_INT64, monitoringInfo.getType());
assertEquals(1, monitoringInfo.getMetric().getCounterData().getInt64Value());
}
@@ -77,9 +77,9 @@
MonitoringInfo monitoringInfo = builder.build();
assertTrue(monitoringInfo != null);
assertEquals(
- SimpleMonitoringInfoBuilder.USER_COUNTER_URN_PREFIX + "myNamespace_withInvalidChar:myName",
+ MonitoringInfoConstants.Urns.USER_COUNTER_PREFIX + "myNamespace_withInvalidChar:myName",
monitoringInfo.getUrn());
- assertEquals(SimpleMonitoringInfoBuilder.SUM_INT64_TYPE_URN, monitoringInfo.getType());
+ assertEquals(MonitoringInfoConstants.TypeUrns.SUM_INT64, monitoringInfo.getType());
assertEquals(1, monitoringInfo.getMetric().getCounterData().getInt64Value());
}
}
diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SimpleStateRegistryTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SimpleStateRegistryTest.java
index ac42caf..2caba26 100644
--- a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SimpleStateRegistryTest.java
+++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SimpleStateRegistryTest.java
@@ -34,21 +34,21 @@
public void testExecutionTimeUrnsBuildMonitoringInfos() throws Exception {
String testPTransformId = "pTransformId";
HashMap<String, String> labelsMetadata = new HashMap<String, String>();
- labelsMetadata.put(SimpleMonitoringInfoBuilder.PTRANSFORM_LABEL, testPTransformId);
+ labelsMetadata.put(MonitoringInfoConstants.Labels.PTRANSFORM, testPTransformId);
SimpleExecutionState startState =
new SimpleExecutionState(
ExecutionStateTracker.START_STATE_NAME,
- SimpleMonitoringInfoBuilder.START_BUNDLE_MSECS_URN,
+ MonitoringInfoConstants.Urns.START_BUNDLE_MSECS,
labelsMetadata);
SimpleExecutionState processState =
new SimpleExecutionState(
ExecutionStateTracker.PROCESS_STATE_NAME,
- SimpleMonitoringInfoBuilder.PROCESS_BUNDLE_MSECS_URN,
+ MonitoringInfoConstants.Urns.PROCESS_BUNDLE_MSECS,
labelsMetadata);
SimpleExecutionState finishState =
new SimpleExecutionState(
ExecutionStateTracker.FINISH_STATE_NAME,
- SimpleMonitoringInfoBuilder.FINISH_BUNDLE_MSECS_URN,
+ MonitoringInfoConstants.Urns.FINISH_BUNDLE_MSECS,
labelsMetadata);
SimpleStateRegistry testObject = new SimpleStateRegistry();
@@ -59,20 +59,20 @@
List<Matcher<MonitoringInfo>> matchers = new ArrayList<Matcher<MonitoringInfo>>();
SimpleMonitoringInfoBuilder builder = new SimpleMonitoringInfoBuilder();
- builder.setUrn(SimpleMonitoringInfoBuilder.START_BUNDLE_MSECS_URN);
+ builder.setUrn(MonitoringInfoConstants.Urns.START_BUNDLE_MSECS);
builder.setInt64Value(0);
builder.setPTransformLabel(testPTransformId);
matchers.add(MonitoringInfoMatchers.matchSetFields(builder.build()));
// Check for execution time metrics for the testPTransformId
builder = new SimpleMonitoringInfoBuilder();
- builder.setUrn(SimpleMonitoringInfoBuilder.PROCESS_BUNDLE_MSECS_URN);
+ builder.setUrn(MonitoringInfoConstants.Urns.PROCESS_BUNDLE_MSECS);
builder.setInt64Value(0);
builder.setPTransformLabel(testPTransformId);
matchers.add(MonitoringInfoMatchers.matchSetFields(builder.build()));
builder = new SimpleMonitoringInfoBuilder();
- builder.setUrn(SimpleMonitoringInfoBuilder.FINISH_BUNDLE_MSECS_URN);
+ builder.setUrn(MonitoringInfoConstants.Urns.FINISH_BUNDLE_MSECS);
builder.setInt64Value(0);
builder.setPTransformLabel(testPTransformId);
matchers.add(MonitoringInfoMatchers.matchSetFields(builder.build()));
diff --git a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SpecMonitoringInfoValidatorTest.java b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SpecMonitoringInfoValidatorTest.java
index bffcd8a..b3abbc0 100644
--- a/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SpecMonitoringInfoValidatorTest.java
+++ b/runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SpecMonitoringInfoValidatorTest.java
@@ -21,6 +21,8 @@
import static org.junit.Assert.assertTrue;
import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo;
+import org.apache.beam.runners.core.metrics.MonitoringInfoConstants.TypeUrns;
+import org.apache.beam.runners.core.metrics.MonitoringInfoConstants.Urns;
import org.junit.Before;
import org.junit.Test;
@@ -48,18 +50,18 @@
public void validateReturnsNoErrorOnValidMonitoringInfo() {
MonitoringInfo testInput =
MonitoringInfo.newBuilder()
- .setUrn(SimpleMonitoringInfoBuilder.USER_COUNTER_URN_PREFIX + "someCounter")
- .setType(SimpleMonitoringInfoBuilder.SUM_INT64_TYPE_URN)
+ .setUrn(Urns.USER_COUNTER_PREFIX + "someCounter")
+ .setType(TypeUrns.SUM_INT64)
.putLabels("dummy", "value")
.build();
assertFalse(testObject.validate(testInput).isPresent());
testInput =
MonitoringInfo.newBuilder()
- .setUrn(SimpleMonitoringInfoBuilder.ELEMENT_COUNT_URN)
- .setType(SimpleMonitoringInfoBuilder.SUM_INT64_TYPE_URN)
- .putLabels("PTRANSFORM", "value")
- .putLabels("PCOLLECTION", "anotherValue")
+ .setUrn(MonitoringInfoConstants.Urns.ELEMENT_COUNT)
+ .setType(TypeUrns.SUM_INT64)
+ .putLabels(MonitoringInfoConstants.Labels.PTRANSFORM, "value")
+ .putLabels(MonitoringInfoConstants.Labels.PCOLLECTION, "anotherValue")
.build();
assertFalse(testObject.validate(testInput).isPresent());
}
@@ -68,9 +70,9 @@
public void validateReturnsErrorOnInvalidMonitoringInfoLabels() {
MonitoringInfo testInput =
MonitoringInfo.newBuilder()
- .setUrn(SimpleMonitoringInfoBuilder.ELEMENT_COUNT_URN)
- .setType(SimpleMonitoringInfoBuilder.SUM_INT64_TYPE_URN)
- .putLabels("PTRANSFORM", "unexpectedLabel")
+ .setUrn(MonitoringInfoConstants.Urns.ELEMENT_COUNT)
+ .setType(TypeUrns.SUM_INT64)
+ .putLabels(MonitoringInfoConstants.Labels.PTRANSFORM, "unexpectedLabel")
.build();
assertTrue(testObject.validate(testInput).isPresent());
}
diff --git a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/SplittableProcessElementsEvaluatorFactory.java b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/SplittableProcessElementsEvaluatorFactory.java
index bafcd48..497d4e8 100644
--- a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/SplittableProcessElementsEvaluatorFactory.java
+++ b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/SplittableProcessElementsEvaluatorFactory.java
@@ -33,7 +33,6 @@
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.runners.AppliedPTransform;
import org.apache.beam.sdk.transforms.DoFnSchemaInformation;
-import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker;
import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
import org.apache.beam.sdk.transforms.windowing.PaneInfo;
import org.apache.beam.sdk.util.WindowedValue;
@@ -47,12 +46,7 @@
import org.joda.time.Duration;
import org.joda.time.Instant;
-class SplittableProcessElementsEvaluatorFactory<
- InputT,
- OutputT,
- RestrictionT,
- PositionT,
- TrackerT extends RestrictionTracker<RestrictionT, PositionT>>
+class SplittableProcessElementsEvaluatorFactory<InputT, OutputT, RestrictionT, PositionT>
implements TransformEvaluatorFactory {
private final ParDoEvaluatorFactory<KeyedWorkItem<byte[], KV<InputT, RestrictionT>>, OutputT>
delegateFactory;
@@ -75,8 +69,8 @@
checkArgument(
ProcessElements.class.isInstance(application.getTransform()),
"No know extraction of the fn from " + application);
- final ProcessElements<InputT, OutputT, RestrictionT, TrackerT> transform =
- (ProcessElements<InputT, OutputT, RestrictionT, TrackerT>)
+ final ProcessElements<InputT, OutputT, RestrictionT, PositionT> transform =
+ (ProcessElements<InputT, OutputT, RestrictionT, PositionT>)
application.getTransform();
return DoFnLifecycleManager.of(transform.newProcessFn(transform.getFn()));
}
@@ -112,11 +106,11 @@
AppliedPTransform<
PCollection<KeyedWorkItem<byte[], KV<InputT, RestrictionT>>>,
PCollectionTuple,
- ProcessElements<InputT, OutputT, RestrictionT, TrackerT>>
+ ProcessElements<InputT, OutputT, RestrictionT, PositionT>>
application,
CommittedBundle<InputT> inputBundle)
throws Exception {
- final ProcessElements<InputT, OutputT, RestrictionT, TrackerT> transform =
+ final ProcessElements<InputT, OutputT, RestrictionT, PositionT> transform =
application.getTransform();
final DoFnLifecycleManagerRemovingTransformEvaluator<
@@ -133,8 +127,8 @@
DoFnSchemaInformation.create());
final ParDoEvaluator<KeyedWorkItem<byte[], KV<InputT, RestrictionT>>> pde =
evaluator.getParDoEvaluator();
- final ProcessFn<InputT, OutputT, RestrictionT, TrackerT> processFn =
- (ProcessFn<InputT, OutputT, RestrictionT, TrackerT>)
+ final ProcessFn<InputT, OutputT, RestrictionT, PositionT> processFn =
+ (ProcessFn<InputT, OutputT, RestrictionT, PositionT>)
ProcessFnRunner.class.cast(pde.getFnRunner()).getFn();
final DirectExecutionContext.DirectStepContext stepContext = pde.getStepContext();
diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineRunner.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineRunner.java
index 7bd2709..f3dc60f 100644
--- a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineRunner.java
+++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineRunner.java
@@ -17,9 +17,8 @@
*/
package org.apache.beam.runners.flink;
-import static org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkNotNull;
+import static org.apache.beam.runners.fnexecution.translation.PipelineTranslatorUtils.hasUnboundedPCollections;
-import java.util.Collection;
import java.util.List;
import javax.annotation.Nullable;
import org.apache.beam.model.pipeline.v1.RunnerApi;
@@ -89,14 +88,4 @@
return FlinkRunner.createPipelineResult(result, pipelineOptions);
}
-
- /** Indicates whether the given pipeline has any unbounded PCollections. */
- private static boolean hasUnboundedPCollections(RunnerApi.Pipeline pipeline) {
- checkNotNull(pipeline);
- Collection<RunnerApi.PCollection> pCollecctions =
- pipeline.getComponents().getPcollectionsMap().values();
- // Assume that all PCollections are consumed at some point in the pipeline.
- return pCollecctions.stream()
- .anyMatch(pc -> pc.getIsBounded() == RunnerApi.IsBounded.Enum.UNBOUNDED);
- }
}
diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingTransformTranslators.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingTransformTranslators.java
index e0abd26..8d42d18 100644
--- a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingTransformTranslators.java
+++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingTransformTranslators.java
@@ -66,7 +66,6 @@
import org.apache.beam.sdk.transforms.join.UnionCoder;
import org.apache.beam.sdk.transforms.reflect.DoFnSignature;
import org.apache.beam.sdk.transforms.reflect.DoFnSignatures;
-import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker;
import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
import org.apache.beam.sdk.transforms.windowing.GlobalWindow;
import org.apache.beam.sdk.transforms.windowing.WindowFn;
@@ -673,14 +672,14 @@
}
private static class SplittableProcessElementsStreamingTranslator<
- InputT, OutputT, RestrictionT, TrackerT extends RestrictionTracker<RestrictionT, ?>>
+ InputT, OutputT, RestrictionT, PositionT>
extends FlinkStreamingPipelineTranslator.StreamTransformTranslator<
SplittableParDoViaKeyedWorkItems.ProcessElements<
- InputT, OutputT, RestrictionT, TrackerT>> {
+ InputT, OutputT, RestrictionT, PositionT>> {
@Override
public void translateNode(
- SplittableParDoViaKeyedWorkItems.ProcessElements<InputT, OutputT, RestrictionT, TrackerT>
+ SplittableParDoViaKeyedWorkItems.ProcessElements<InputT, OutputT, RestrictionT, PositionT>
transform,
FlinkStreamingTranslationContext context) {
diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java
index dcc473f..bb26daa 100644
--- a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java
+++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java
@@ -19,7 +19,6 @@
import static org.apache.flink.util.Preconditions.checkNotNull;
-import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.util.Arrays;
@@ -35,7 +34,7 @@
import java.util.concurrent.locks.Lock;
import java.util.concurrent.locks.ReentrantLock;
import java.util.function.BiConsumer;
-import java.util.function.Consumer;
+import java.util.function.Supplier;
import java.util.stream.Collectors;
import org.apache.beam.model.fnexecution.v1.BeamFnApi.ProcessBundleProgressResponse;
import org.apache.beam.model.fnexecution.v1.BeamFnApi.ProcessBundleResponse;
@@ -110,7 +109,7 @@
private final Map<String, TupleTag<?>> outputMap;
private final Map<RunnerApi.ExecutableStagePayload.SideInputId, PCollectionView<?>> sideInputIds;
/** A lock which has to be acquired when concurrently accessing state and timers. */
- private final Lock stateBackendLock;
+ private final ReentrantLock stateBackendLock;
private transient FlinkExecutableStageContext stageContext;
private transient StateRequestHandler stateRequestHandler;
@@ -331,13 +330,16 @@
}
private void prepareStateBackend(K key, Coder<K> keyCoder) {
- ByteArrayOutputStream baos = new ByteArrayOutputStream();
+ final ByteBuffer encodedKey;
try {
- keyCoder.encode(key, baos);
- } catch (IOException e) {
- throw new RuntimeException("Failed to encode key for Flink state backend", e);
+ // We need to have NESTED context here with the ByteStringCoder.
+ // See StateRequestHandlers.
+ encodedKey =
+ ByteBuffer.wrap(CoderUtils.encodeToByteArray(keyCoder, key, Coder.Context.NESTED));
+ } catch (CoderException e) {
+ throw new RuntimeException("Couldn't set key for state");
}
- keyedStateBackend.setCurrentKey(ByteBuffer.wrap(baos.toByteArray()));
+ keyedStateBackend.setCurrentKey(encodedKey);
}
};
}
@@ -365,47 +367,36 @@
public void setCurrentKey(Object key) {}
@Override
- public Object getCurrentKey() {
- // This is the key retrieved by HeapInternalTimerService when setting a Flink timer
- return sdkHarnessRunner.getCurrentTimerKey();
+ public ByteBuffer getCurrentKey() {
+ // This is the key retrieved by HeapInternalTimerService when setting a Flink timer.
+ // Note: Only called by the TimerService. Must be guarded by a lock.
+ Preconditions.checkState(
+ stateBackendLock.isLocked(),
+ "State backend must be locked when retrieving the current key.");
+ return this.<ByteBuffer>getKeyedStateBackend().getCurrentKey();
}
private void setTimer(WindowedValue<InputT> timerElement, TimerInternals.TimerData timerData) {
try {
- Object key = keySelector.getKey(timerElement);
- sdkHarnessRunner.setCurrentTimerKey(key);
+ // KvToByteBufferKeySelector returns the key encoded
+ ByteBuffer encodedKey = (ByteBuffer) keySelector.getKey(timerElement);
// We have to synchronize to ensure the state backend is not concurrently accessed by the
// state requests
try {
stateBackendLock.lock();
- getKeyedStateBackend().setCurrentKey(key);
+ getKeyedStateBackend().setCurrentKey(encodedKey);
timerInternals.setTimer(timerData);
} finally {
stateBackendLock.unlock();
}
} catch (Exception e) {
throw new RuntimeException("Couldn't set timer", e);
- } finally {
- sdkHarnessRunner.setCurrentTimerKey(null);
}
}
@Override
public void fireTimer(InternalTimer<?, TimerInternals.TimerData> timer) {
- // We need to decode the key
final ByteBuffer encodedKey = (ByteBuffer) timer.getKey();
- @SuppressWarnings("ByteBufferBackingArray")
- byte[] bytes = encodedKey.array();
- final Object decodedKey;
- try {
- decodedKey = CoderUtils.decodeFromByteArray(keyCoder, bytes);
- } catch (CoderException e) {
- throw new RuntimeException(
- String.format(Locale.ENGLISH, "Failed to decode encoded key: %s", Arrays.toString(bytes)),
- e);
- }
- // Prepare the SdkHarnessRunner with the key for the timer
- sdkHarnessRunner.setCurrentTimerKey(decodedKey);
// We have to synchronize to ensure the state backend is not concurrently accessed by the state
// requests
try {
@@ -455,8 +446,8 @@
outputManager,
outputMap,
(Coder<BoundedWindow>) windowingStrategy.getWindowFn().windowCoder(),
- keySelector,
- this::setTimer);
+ this::setTimer,
+ () -> FlinkKeyUtils.decodeKey(getCurrentKey(), keyCoder));
return ensureStateCleanup(sdkHarnessRunner);
}
@@ -529,16 +520,11 @@
private final Map<String, TimerSpec> timerOutputIdToSpecMap;
private final Coder<BoundedWindow> windowCoder;
- private final KeySelector<WindowedValue<InputT>, ?> keySelector;
private final BiConsumer<WindowedValue<InputT>, TimerInternals.TimerData> timerRegistration;
+ private final Supplier<Object> keyForTimer;
private RemoteBundle remoteBundle;
private FnDataReceiver<WindowedValue<?>> mainInputReceiver;
- // Timer key set before calling Flink's internal timer service to register
- // a timer. The timer service will retrieve this with a call to {@code getCurrentKey}.
- // Before firing a timer, this will be initialized with the current key
- // from the timer element.
- private Object currentTimerKey;
public SdkHarnessDoFnRunner(
String mainInput,
@@ -548,17 +534,17 @@
BufferedOutputManager<OutputT> outputManager,
Map<String, TupleTag<?>> outputMap,
Coder<BoundedWindow> windowCoder,
- KeySelector<WindowedValue<InputT>, ?> keySelector,
- BiConsumer<WindowedValue<InputT>, TimerInternals.TimerData> timerRegistration) {
+ BiConsumer<WindowedValue<InputT>, TimerInternals.TimerData> timerRegistration,
+ Supplier<Object> keyForTimer) {
this.mainInput = mainInput;
this.stageBundleFactory = stageBundleFactory;
this.stateRequestHandler = stateRequestHandler;
this.progressHandler = progressHandler;
this.outputManager = outputManager;
this.outputMap = outputMap;
- this.keySelector = keySelector;
this.timerRegistration = timerRegistration;
this.timerOutputIdToSpecMap = new HashMap<>();
+ this.keyForTimer = keyForTimer;
// Gather all timers from all transforms by their output pCollectionId which is unique
for (Map<String, ProcessBundleDescriptors.TimerSpec> transformTimerMap :
stageBundleFactory.getProcessBundleDescriptor().getTimerSpecs().values()) {
@@ -609,8 +595,8 @@
@Override
public void onTimer(
String timerId, BoundedWindow window, Instant timestamp, TimeDomain timeDomain) {
- Preconditions.checkNotNull(
- currentTimerKey, "Key for timer needs to be set before calling onTimer");
+ Object timerKey = keyForTimer.get();
+ Preconditions.checkNotNull(timerKey, "Key for timer needs to be set before calling onTimer");
Preconditions.checkNotNull(remoteBundle, "Call to onTimer outside of a bundle");
LOG.debug("timer callback: {} {} {} {}", timerId, window, timestamp, timeDomain);
FnDataReceiver<WindowedValue<?>> timerReceiver =
@@ -620,7 +606,7 @@
timerId);
WindowedValue<KV<Object, Timer>> timerValue =
WindowedValue.of(
- KV.of(currentTimerKey, Timer.of(timestamp, new byte[0])),
+ KV.of(timerKey, Timer.of(timestamp, new byte[0])),
timestamp,
Collections.singleton(window),
PaneInfo.NO_FIRING);
@@ -629,8 +615,6 @@
} catch (Exception e) {
throw new RuntimeException(
String.format(Locale.ENGLISH, "Failed to process timer %s", timerReceiver), e);
- } finally {
- currentTimerKey = null;
}
}
@@ -648,16 +632,6 @@
}
}
- /** Key for timer which has not been registered yet. */
- Object getCurrentTimerKey() {
- return currentTimerKey;
- }
-
- /** Key for timer which is about to be fired. */
- void setCurrentTimerKey(Object key) {
- this.currentTimerKey = key;
- }
-
boolean isBundleInProgress() {
return remoteBundle != null;
}
@@ -723,7 +697,6 @@
windowingStrategy,
keyCoder,
windowCoder,
- sdkHarnessRunner::setCurrentTimerKey,
getKeyedStateBackend());
List<String> userStates =
@@ -744,7 +717,6 @@
private final WindowingStrategy windowingStrategy;
private final Coder keyCoder;
private final Coder windowCoder;
- private final Consumer<ByteBuffer> currentKeyConsumer;
private final KeyedStateBackend<ByteBuffer> keyedStateBackend;
CleanupTimer(
@@ -753,14 +725,12 @@
WindowingStrategy windowingStrategy,
Coder keyCoder,
Coder windowCoder,
- Consumer<ByteBuffer> currentKeyConsumer,
KeyedStateBackend<ByteBuffer> keyedStateBackend) {
this.timerInternals = timerInternals;
this.stateBackendLock = stateBackendLock;
this.windowingStrategy = windowingStrategy;
this.keyCoder = keyCoder;
this.windowCoder = windowCoder;
- this.currentKeyConsumer = currentKeyConsumer;
this.keyedStateBackend = keyedStateBackend;
}
@@ -774,19 +744,10 @@
Preconditions.checkNotNull(input, "Null input passed to CleanupTimer");
// make sure this fires after any window.maxTimestamp() timers
Instant gcTime = LateDataUtils.garbageCollectionTime(window, windowingStrategy).plus(1);
- final ByteBuffer key;
- try {
- key = ByteBuffer.wrap(CoderUtils.encodeToByteArray(keyCoder, ((KV) input).getKey()));
- } catch (CoderException e) {
- throw new RuntimeException("Failed to encode key for Flink state backend", e);
- }
+ final ByteBuffer key = FlinkKeyUtils.encodeKey(((KV) input).getKey(), keyCoder);
// Ensure the state backend is not concurrently accessed by the state requests
try {
stateBackendLock.lock();
- // Set these two to ensure correct timer registration
- // 1) For the timer setting
- currentKeyConsumer.accept(key);
- // 2) For the timer deduplication
keyedStateBackend.setCurrentKey(key);
timerInternals.setTimer(
StateNamespaces.window(windowCoder, window),
diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/FlinkKeyUtils.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/FlinkKeyUtils.java
new file mode 100644
index 0000000..687a231
--- /dev/null
+++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/FlinkKeyUtils.java
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.flink.translation.wrappers.streaming;
+
+import static org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkNotNull;
+import static org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkState;
+
+import java.nio.ByteBuffer;
+import java.util.Arrays;
+import java.util.Locale;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.util.CoderUtils;
+
+/**
+ * Utility functions for dealing with key encoding. Beam requires keys to be compared in binary
+ * format. The helpers here ensure that a consistent encoding is used.
+ */
+class FlinkKeyUtils {
+
+ /** Encodes a key to a byte array wrapped inside a ByteBuffer. */
+ static <K> ByteBuffer encodeKey(K key, Coder<K> keyCoder) {
+ checkNotNull(keyCoder, "Provided coder must not be null");
+ final byte[] keyBytes;
+ try {
+ keyBytes = CoderUtils.encodeToByteArray(keyCoder, key);
+ } catch (Exception e) {
+ throw new RuntimeException(String.format(Locale.ENGLISH, "Failed to encode key: %s", key), e);
+ }
+ return ByteBuffer.wrap(keyBytes);
+ }
+
+ /** Decodes a key from a ByteBuffer containing a byte array. */
+ static <K> K decodeKey(ByteBuffer byteBuffer, Coder<K> keyCoder) {
+ checkNotNull(byteBuffer, "Provided ByteBuffer must not be null");
+ checkNotNull(keyCoder, "Provided coder must not be null");
+ checkState(byteBuffer.hasArray(), "ByteBuffer key must contain an array.");
+ @SuppressWarnings("ByteBufferBackingArray")
+ final byte[] keyBytes = byteBuffer.array();
+ try {
+ return CoderUtils.decodeFromByteArray(keyCoder, keyBytes);
+ } catch (Exception e) {
+ throw new RuntimeException(
+ String.format(
+ Locale.ENGLISH, "Failed to decode encoded key: %s", Arrays.toString(keyBytes)),
+ e);
+ }
+ }
+}
diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/KvToByteBufferKeySelector.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/KvToByteBufferKeySelector.java
index 9438291..ea181d2 100644
--- a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/KvToByteBufferKeySelector.java
+++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/KvToByteBufferKeySelector.java
@@ -19,7 +19,6 @@
import java.nio.ByteBuffer;
import org.apache.beam.sdk.coders.Coder;
-import org.apache.beam.sdk.util.CoderUtils;
import org.apache.beam.sdk.util.WindowedValue;
import org.apache.beam.sdk.values.KV;
import org.apache.flink.api.common.typeinfo.TypeInformation;
@@ -44,8 +43,7 @@
@Override
public ByteBuffer getKey(WindowedValue<KV<K, V>> value) throws Exception {
K key = value.getValue().getKey();
- byte[] keyBytes = CoderUtils.encodeToByteArray(keyCoder, key);
- return ByteBuffer.wrap(keyBytes);
+ return FlinkKeyUtils.encodeKey(key, keyCoder);
}
@Override
diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/SplittableDoFnOperator.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/SplittableDoFnOperator.java
index 5dccc57..ff59500 100644
--- a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/SplittableDoFnOperator.java
+++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/SplittableDoFnOperator.java
@@ -41,7 +41,6 @@
import org.apache.beam.sdk.state.TimeDomain;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.DoFnSchemaInformation;
-import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker;
import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
import org.apache.beam.sdk.transforms.windowing.PaneInfo;
import org.apache.beam.sdk.util.WindowedValue;
@@ -59,8 +58,7 @@
* Flink operator for executing splittable {@link DoFn DoFns}. Specifically, for executing the
* {@code @ProcessElement} method of a splittable {@link DoFn}.
*/
-public class SplittableDoFnOperator<
- InputT, OutputT, RestrictionT, TrackerT extends RestrictionTracker<RestrictionT, ?>>
+public class SplittableDoFnOperator<InputT, OutputT, RestrictionT>
extends DoFnOperator<KeyedWorkItem<byte[], KV<InputT, RestrictionT>>, OutputT> {
private transient ScheduledExecutorService executorService;
diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/WorkItemKeySelector.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/WorkItemKeySelector.java
index ccc1ed4..ec77481 100644
--- a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/WorkItemKeySelector.java
+++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/WorkItemKeySelector.java
@@ -20,7 +20,6 @@
import java.nio.ByteBuffer;
import org.apache.beam.runners.core.KeyedWorkItem;
import org.apache.beam.sdk.coders.Coder;
-import org.apache.beam.sdk.util.CoderUtils;
import org.apache.beam.sdk.util.WindowedValue;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.api.java.functions.KeySelector;
@@ -45,8 +44,7 @@
@Override
public ByteBuffer getKey(WindowedValue<SingletonKeyedWorkItem<K, V>> value) throws Exception {
K key = value.getValue().key();
- byte[] keyBytes = CoderUtils.encodeToByteArray(keyCoder, key);
- return ByteBuffer.wrap(keyBytes);
+ return FlinkKeyUtils.encodeKey(key, keyCoder);
}
@Override
diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkJobServerDriverTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkJobServerDriverTest.java
index d609329..0071314 100644
--- a/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkJobServerDriverTest.java
+++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkJobServerDriverTest.java
@@ -44,7 +44,7 @@
assertThat(config.getExpansionPort(), is(8097));
assertThat(config.getFlinkMasterUrl(), is("[auto]"));
assertThat(config.getSdkWorkerParallelism(), is(1L));
- assertThat(config.isCleanArtifactsPerJob(), is(false));
+ assertThat(config.isCleanArtifactsPerJob(), is(true));
FlinkJobServerDriver flinkJobServerDriver = FlinkJobServerDriver.fromConfig(config);
assertThat(flinkJobServerDriver, is(not(nullValue())));
}
@@ -63,7 +63,7 @@
"44",
"--flink-master-url=jobmanager",
"--sdk-worker-parallelism=4",
- "--clean-artifacts-per-job",
+ "--clean-artifacts-per-job=false",
});
FlinkJobServerDriver.FlinkServerConfiguration config =
(FlinkJobServerDriver.FlinkServerConfiguration) driver.configuration;
@@ -73,7 +73,7 @@
assertThat(config.getExpansionPort(), is(44));
assertThat(config.getFlinkMasterUrl(), is("jobmanager"));
assertThat(config.getSdkWorkerParallelism(), is(4L));
- assertThat(config.isCleanArtifactsPerJob(), is(true));
+ assertThat(config.isCleanArtifactsPerJob(), is(false));
}
@Test
diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/metrics/FlinkMetricContainerTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/metrics/FlinkMetricContainerTest.java
index 8a623bf..a73d897 100644
--- a/runners/flink/src/test/java/org/apache/beam/runners/flink/metrics/FlinkMetricContainerTest.java
+++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/metrics/FlinkMetricContainerTest.java
@@ -18,8 +18,6 @@
package org.apache.beam.runners.flink.metrics;
import static org.apache.beam.model.pipeline.v1.MetricsApi.labelProps;
-import static org.apache.beam.runners.core.metrics.SimpleMonitoringInfoBuilder.ELEMENT_COUNT_URN;
-import static org.apache.beam.runners.core.metrics.SimpleMonitoringInfoBuilder.USER_COUNTER_URN_PREFIX;
import static org.apache.beam.runners.flink.metrics.FlinkMetricContainer.getFlinkMetricNameString;
import static org.hamcrest.CoreMatchers.is;
import static org.junit.Assert.assertNotNull;
@@ -42,6 +40,8 @@
import org.apache.beam.runners.core.metrics.DistributionCell;
import org.apache.beam.runners.core.metrics.DistributionData;
import org.apache.beam.runners.core.metrics.MetricsContainerStepMap;
+import org.apache.beam.runners.core.metrics.MonitoringInfoConstants;
+import org.apache.beam.runners.core.metrics.MonitoringInfoConstants.Urns;
import org.apache.beam.runners.core.metrics.SimpleMonitoringInfoBuilder;
import org.apache.beam.runners.flink.metrics.FlinkMetricContainer.FlinkDistributionGauge;
import org.apache.beam.sdk.metrics.Counter;
@@ -146,7 +146,7 @@
assertNotNull(userCountMonitoringInfo);
SimpleMonitoringInfoBuilder elemCountBuilder = new SimpleMonitoringInfoBuilder();
- elemCountBuilder.setUrn(ELEMENT_COUNT_URN);
+ elemCountBuilder.setUrn(MonitoringInfoConstants.Urns.ELEMENT_COUNT);
elemCountBuilder.setInt64Value(222);
elemCountBuilder.setPTransformLabel("step");
elemCountBuilder.setPCollectionLabel("pcoll");
@@ -168,7 +168,7 @@
MonitoringInfo intCounter =
MonitoringInfo.newBuilder()
- .setUrn(USER_COUNTER_URN_PREFIX + "ns1:int_counter")
+ .setUrn(Urns.USER_COUNTER_PREFIX + "ns1:int_counter")
.putLabels(PTRANSFORM_LABEL, "step")
.setMetric(
Metric.newBuilder().setCounterData(CounterData.newBuilder().setInt64Value(111)))
@@ -176,7 +176,7 @@
MonitoringInfo doubleCounter =
MonitoringInfo.newBuilder()
- .setUrn(USER_COUNTER_URN_PREFIX + "ns2:double_counter")
+ .setUrn(Urns.USER_COUNTER_PREFIX + "ns2:double_counter")
.putLabels(PTRANSFORM_LABEL, "step")
.setMetric(
Metric.newBuilder().setCounterData(CounterData.newBuilder().setDoubleValue(222)))
@@ -184,7 +184,7 @@
MonitoringInfo intDistribution =
MonitoringInfo.newBuilder()
- .setUrn(USER_COUNTER_URN_PREFIX + "ns3:int_distribution")
+ .setUrn(Urns.USER_COUNTER_PREFIX + "ns3:int_distribution")
.putLabels(PTRANSFORM_LABEL, "step")
.setMetric(
Metric.newBuilder()
@@ -200,7 +200,7 @@
MonitoringInfo doubleDistribution =
MonitoringInfo.newBuilder()
- .setUrn(USER_COUNTER_URN_PREFIX + "ns4:double_distribution")
+ .setUrn(Urns.USER_COUNTER_PREFIX + "ns4:double_distribution")
.putLabels(PTRANSFORM_LABEL, "step")
.setMetric(
Metric.newBuilder()
diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperatorTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperatorTest.java
index 2b51583..1e5d120 100644
--- a/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperatorTest.java
+++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperatorTest.java
@@ -36,7 +36,6 @@
import java.util.List;
import java.util.Map;
import java.util.concurrent.locks.Lock;
-import java.util.function.Consumer;
import javax.annotation.Nullable;
import org.apache.beam.model.pipeline.v1.RunnerApi.Components;
import org.apache.beam.model.pipeline.v1.RunnerApi.ExecutableStagePayload;
@@ -66,7 +65,6 @@
import org.apache.beam.sdk.state.BagState;
import org.apache.beam.sdk.state.TimeDomain;
import org.apache.beam.sdk.transforms.windowing.GlobalWindow;
-import org.apache.beam.sdk.util.CoderUtils;
import org.apache.beam.sdk.util.WindowedValue;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.values.TupleTag;
@@ -395,7 +393,6 @@
@Test
public void testEnsureStateCleanupWithKeyedInputCleanupTimer() throws Exception {
InMemoryTimerInternals inMemoryTimerInternals = new InMemoryTimerInternals();
- Consumer<ByteBuffer> keyConsumer = Mockito.mock(Consumer.class);
KeyedStateBackend keyedStateBackend = Mockito.mock(KeyedStateBackend.class);
Lock stateBackendLock = Mockito.mock(Lock.class);
StringUtf8Coder keyCoder = StringUtf8Coder.of();
@@ -410,13 +407,11 @@
WindowingStrategy.globalDefault(),
keyCoder,
windowCoder,
- keyConsumer,
keyedStateBackend);
cleanupTimer.setForWindow(KV.of("key", "string"), window);
Mockito.verify(stateBackendLock).lock();
- ByteBuffer key = ByteBuffer.wrap(CoderUtils.encodeToByteArray(keyCoder, "key"));
- Mockito.verify(keyConsumer).accept(key);
+ ByteBuffer key = FlinkKeyUtils.encodeKey("key", keyCoder);
Mockito.verify(keyedStateBackend).setCurrentKey(key);
assertThat(
inMemoryTimerInternals.getNextTimer(TimeDomain.EVENT_TIME),
diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/FlinkKeyUtilsTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/FlinkKeyUtilsTest.java
new file mode 100644
index 0000000..06b5d01
--- /dev/null
+++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/FlinkKeyUtilsTest.java
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.flink.translation.wrappers.streaming;
+
+import static org.hamcrest.CoreMatchers.nullValue;
+import static org.hamcrest.MatcherAssert.assertThat;
+import static org.hamcrest.core.Is.is;
+
+import com.google.protobuf.ByteString;
+import java.nio.ByteBuffer;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.coders.VoidCoder;
+import org.apache.beam.sdk.extensions.protobuf.ByteStringCoder;
+import org.junit.Test;
+
+/** Tests for {@link FlinkKeyUtils}. */
+public class FlinkKeyUtilsTest {
+
+ @Test
+ public void testEncodeDecode() {
+ String key = "key";
+ StringUtf8Coder coder = StringUtf8Coder.of();
+
+ ByteBuffer byteBuffer = FlinkKeyUtils.encodeKey(key, coder);
+ assertThat(FlinkKeyUtils.decodeKey(byteBuffer, coder), is(key));
+ }
+
+ @Test
+ public void testNullKey() {
+ Void key = null;
+ VoidCoder coder = VoidCoder.of();
+
+ ByteBuffer byteBuffer = FlinkKeyUtils.encodeKey(key, coder);
+ assertThat(FlinkKeyUtils.decodeKey(byteBuffer, coder), is(nullValue()));
+ }
+
+ @Test
+ @SuppressWarnings("ByteBufferBackingArray")
+ public void testCoderContext() throws Exception {
+ byte[] bytes = {1, 1, 1};
+ ByteString key = ByteString.copyFrom(bytes);
+ ByteStringCoder coder = ByteStringCoder.of();
+
+ ByteBuffer encoded = FlinkKeyUtils.encodeKey(key, coder);
+ // Ensure outer context is used where no length encoding is used.
+ assertThat(encoded.array(), is(bytes));
+ }
+}
diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java
index 2c473a2..049a904 100644
--- a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java
+++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java
@@ -39,9 +39,9 @@
import org.apache.beam.runners.dataflow.options.DataflowPipelineOptions;
import org.apache.beam.runners.dataflow.util.MonitoringUtil;
import org.apache.beam.sdk.PipelineResult;
+import org.apache.beam.sdk.extensions.gcp.util.BackOffAdapter;
import org.apache.beam.sdk.metrics.MetricResults;
import org.apache.beam.sdk.runners.AppliedPTransform;
-import org.apache.beam.sdk.util.BackOffAdapter;
import org.apache.beam.sdk.util.FluentBackoff;
import org.apache.beam.vendor.guava.v20_0.com.google.common.annotations.VisibleForTesting;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.BiMap;
diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java
index 5aa9038..7b13e59 100644
--- a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java
+++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java
@@ -88,6 +88,7 @@
import org.apache.beam.sdk.transforms.View;
import org.apache.beam.sdk.transforms.display.DisplayData;
import org.apache.beam.sdk.transforms.display.HasDisplayData;
+import org.apache.beam.sdk.transforms.reflect.DoFnInvokers;
import org.apache.beam.sdk.transforms.reflect.DoFnSignature;
import org.apache.beam.sdk.transforms.reflect.DoFnSignatures;
import org.apache.beam.sdk.transforms.windowing.DefaultTrigger;
@@ -941,6 +942,20 @@
transform.getMainOutputTag(),
outputCoders,
doFnSchemaInformation);
+
+ // TODO: Move this logic into translateFn once the legacy ProcessKeyedElements is
+ // removed.
+ if (context.isFnApi()) {
+ DoFnSignature signature = DoFnSignatures.signatureForDoFn(transform.getFn());
+ if (signature.processElement().isSplittable()) {
+ Coder<?> restrictionCoder =
+ DoFnInvokers.invokerFor(transform.getFn())
+ .invokeGetRestrictionCoder(
+ context.getInput(transform).getPipeline().getCoderRegistry());
+ stepContext.addInput(
+ PropertyNames.RESTRICTION_ENCODING, translateCoder(restrictionCoder, context));
+ }
+ }
}
});
@@ -983,6 +998,20 @@
transform.getMainOutputTag(),
outputCoders,
doFnSchemaInformation);
+
+ // TODO: Move this logic into translateFn once the legacy ProcessKeyedElements is
+ // removed.
+ if (context.isFnApi()) {
+ DoFnSignature signature = DoFnSignatures.signatureForDoFn(transform.getFn());
+ if (signature.processElement().isSplittable()) {
+ Coder<?> restrictionCoder =
+ DoFnInvokers.invokerFor(transform.getFn())
+ .invokeGetRestrictionCoder(
+ context.getInput(transform).getPipeline().getCoderRegistry());
+ stepContext.addInput(
+ PropertyNames.RESTRICTION_ENCODING, translateCoder(restrictionCoder, context));
+ }
+ }
}
});
@@ -1013,7 +1042,7 @@
registerTransformTranslator(Read.Bounded.class, new ReadTranslator());
///////////////////////////////////////////////////////////////////////////
- // Splittable DoFn translation.
+ // Legacy Splittable DoFn translation.
registerTransformTranslator(
SplittableParDo.ProcessKeyedElements.class,
diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java
index d7b231e..1280edd 100644
--- a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java
+++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java
@@ -356,18 +356,21 @@
DeduplicatedFlattenFactory.create()))
.add(
PTransformOverride.of(
- PTransformMatchers.emptyFlatten(), EmptyFlattenAsCreateFactory.instance()))
- // By default Dataflow runner replaces single-output ParDo with a ParDoSingle override.
- // However, we want a different expansion for single-output splittable ParDo.
- .add(
- PTransformOverride.of(
- PTransformMatchers.splittableParDoSingle(),
- new ReflectiveOneToOneOverrideFactory(
- SplittableParDoOverrides.ParDoSingleViaMulti.class, this)))
- .add(
- PTransformOverride.of(
- PTransformMatchers.splittableParDoMulti(),
- new SplittableParDoOverrides.SplittableParDoOverrideFactory()));
+ PTransformMatchers.emptyFlatten(), EmptyFlattenAsCreateFactory.instance()));
+ if (!fnApiEnabled) {
+ // By default Dataflow runner replaces single-output ParDo with a ParDoSingle override.
+ // However, we want a different expansion for single-output splittable ParDo.
+ overridesBuilder
+ .add(
+ PTransformOverride.of(
+ PTransformMatchers.splittableParDoSingle(),
+ new ReflectiveOneToOneOverrideFactory(
+ SplittableParDoOverrides.ParDoSingleViaMulti.class, this)))
+ .add(
+ PTransformOverride.of(
+ PTransformMatchers.splittableParDoMulti(),
+ new SplittableParDoOverrides.SplittableParDoOverrideFactory()));
+ }
if (streaming) {
if (!hasExperiment(options, "enable_custom_pubsub_source")) {
overridesBuilder.add(
diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/PrimitiveParDoSingleFactory.java b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/PrimitiveParDoSingleFactory.java
index e07fa44..19a01d8 100644
--- a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/PrimitiveParDoSingleFactory.java
+++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/PrimitiveParDoSingleFactory.java
@@ -19,6 +19,7 @@
import static org.apache.beam.runners.core.construction.PTransformTranslation.PAR_DO_TRANSFORM_URN;
import static org.apache.beam.runners.core.construction.ParDoTranslation.translateTimerSpec;
+import static org.apache.beam.sdk.options.ExperimentalOptions.hasExperiment;
import static org.apache.beam.sdk.transforms.reflect.DoFnSignatures.getStateSpecOrThrow;
import static org.apache.beam.sdk.transforms.reflect.DoFnSignatures.getTimerSpecOrThrow;
import static org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument;
@@ -48,6 +49,7 @@
import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.transforms.ParDo.SingleOutput;
+import org.apache.beam.sdk.transforms.reflect.DoFnInvokers;
import org.apache.beam.sdk.transforms.reflect.DoFnSignature;
import org.apache.beam.sdk.transforms.reflect.DoFnSignatures;
import org.apache.beam.sdk.values.PCollection;
@@ -163,11 +165,14 @@
final ParDoSingle<?, ?> parDo = transform.getTransform();
final DoFn<?, ?> doFn = parDo.getFn();
final DoFnSignature signature = DoFnSignatures.getSignature(doFn.getClass());
- checkArgument(
- !signature.processElement().isSplittable(),
- String.format(
- "Not expecting a splittable %s: should have been overridden",
- ParDoSingle.class.getSimpleName()));
+
+ if (!hasExperiment(transform.getPipeline().getOptions(), "beam_fn_api")) {
+ checkArgument(
+ !signature.processElement().isSplittable(),
+ String.format(
+ "Not expecting a splittable %s: should have been overridden",
+ ParDoSingle.class.getSimpleName()));
+ }
// TODO: Is there a better way to do this?
Set<String> allInputs =
@@ -233,11 +238,24 @@
@Override
public boolean isSplittable() {
- return false;
+ return signature.processElement().isSplittable();
}
@Override
public String translateRestrictionCoderId(SdkComponents newComponents) {
+ if (signature.processElement().isSplittable()) {
+ Coder<?> restrictionCoder =
+ DoFnInvokers.invokerFor(doFn)
+ .invokeGetRestrictionCoder(transform.getPipeline().getCoderRegistry());
+ try {
+ return newComponents.registerCoder(restrictionCoder);
+ } catch (IOException e) {
+ throw new IllegalStateException(
+ String.format(
+ "Unable to register restriction coder for %s.", transform.getFullName()),
+ e);
+ }
+ }
return "";
}
},
diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/DataflowTransport.java b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/DataflowTransport.java
index 6b62fac..c7b1be8 100644
--- a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/DataflowTransport.java
+++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/DataflowTransport.java
@@ -17,8 +17,8 @@
*/
package org.apache.beam.runners.dataflow.util;
-import static org.apache.beam.sdk.util.Transport.getJsonFactory;
-import static org.apache.beam.sdk.util.Transport.getTransport;
+import static org.apache.beam.sdk.extensions.gcp.util.Transport.getJsonFactory;
+import static org.apache.beam.sdk.extensions.gcp.util.Transport.getTransport;
import com.google.api.client.http.HttpRequestInitializer;
import com.google.api.services.clouddebugger.v2.CloudDebugger;
@@ -30,7 +30,7 @@
import java.net.URL;
import org.apache.beam.runners.dataflow.options.DataflowPipelineOptions;
import org.apache.beam.sdk.extensions.gcp.auth.NullCredentialInitializer;
-import org.apache.beam.sdk.util.RetryHttpRequestInitializer;
+import org.apache.beam.sdk.extensions.gcp.util.RetryHttpRequestInitializer;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableList;
/** Helpers for cloud communication. */
diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/DefaultCoderCloudObjectTranslatorRegistrar.java b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/DefaultCoderCloudObjectTranslatorRegistrar.java
index 25d6df9..237d92d 100644
--- a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/DefaultCoderCloudObjectTranslatorRegistrar.java
+++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/DefaultCoderCloudObjectTranslatorRegistrar.java
@@ -69,6 +69,7 @@
CloudObjectTranslators.windowedValue(),
new AvroCoderCloudObjectTranslator(),
new SerializableCoderCloudObjectTranslator(),
+ new SchemaCoderCloudObjectTranslator(),
CloudObjectTranslators.iterableLike(CollectionCoder.class),
CloudObjectTranslators.iterableLike(ListCoder.class),
CloudObjectTranslators.iterableLike(SetCoder.class),
diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PackageUtil.java b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PackageUtil.java
index 259fc1a..1517e28 100644
--- a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PackageUtil.java
+++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PackageUtil.java
@@ -48,10 +48,10 @@
import javax.annotation.Nullable;
import org.apache.beam.sdk.annotations.Internal;
import org.apache.beam.sdk.extensions.gcp.storage.GcsCreateOptions;
+import org.apache.beam.sdk.extensions.gcp.util.BackOffAdapter;
import org.apache.beam.sdk.io.FileSystems;
import org.apache.beam.sdk.io.fs.CreateOptions;
import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions;
-import org.apache.beam.sdk.util.BackOffAdapter;
import org.apache.beam.sdk.util.FluentBackoff;
import org.apache.beam.sdk.util.MimeTypes;
import org.apache.beam.sdk.util.MoreFutures;
diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PropertyNames.java b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PropertyNames.java
index e644e0f..e44170d 100644
--- a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PropertyNames.java
+++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PropertyNames.java
@@ -62,7 +62,13 @@
public static final String VALUE = "value";
public static final String WINDOWING_STRATEGY = "windowing_strategy";
public static final String DISPLAY_DATA = "display_data";
- public static final String RESTRICTION_CODER = "restriction_coder";
+ /**
+ * @deprecated Uses the incorrect terminology. {@link #RESTRICTION_ENCODING}. Should be removed
+ * once non FnAPI SplittableDoFn expansion for Dataflow is removed.
+ */
+ @Deprecated public static final String RESTRICTION_CODER = "restriction_coder";
+
public static final String IMPULSE_ELEMENT = "impulse_element";
public static final String PIPELINE_PROTO_CODER_ID = "pipeline_proto_coder_id";
+ public static final String RESTRICTION_ENCODING = "restriction_encoding";
}
diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/SchemaCoderCloudObjectTranslator.java b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/SchemaCoderCloudObjectTranslator.java
new file mode 100644
index 0000000..2395f12
--- /dev/null
+++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/SchemaCoderCloudObjectTranslator.java
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.dataflow.util;
+
+import java.io.IOException;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import org.apache.beam.runners.core.construction.SchemaTranslation;
+import org.apache.beam.runners.core.construction.SdkComponents;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.SchemaCoder;
+import org.apache.beam.sdk.transforms.SerializableFunction;
+import org.apache.beam.sdk.util.SerializableUtils;
+import org.apache.beam.sdk.util.StringUtils;
+
+/** Translator for Schema coders. */
+public class SchemaCoderCloudObjectTranslator implements CloudObjectTranslator<SchemaCoder> {
+ private static final String SCHEMA = "schema";
+ private static final String TO_ROW_FUNCTION = "toRowFunction";
+ private static final String FROM_ROW_FUNCTION = "fromRowFunction";
+
+ /** Convert to a cloud object. */
+ @Override
+ public CloudObject toCloudObject(SchemaCoder target, SdkComponents sdkComponents) {
+ CloudObject base = CloudObject.forClass(SchemaCoder.class);
+
+ Structs.addString(
+ base,
+ TO_ROW_FUNCTION,
+ StringUtils.byteArrayToJsonString(
+ SerializableUtils.serializeToByteArray(target.getToRowFunction())));
+ Structs.addString(
+ base,
+ FROM_ROW_FUNCTION,
+ StringUtils.byteArrayToJsonString(
+ SerializableUtils.serializeToByteArray(target.getFromRowFunction())));
+ Structs.addString(
+ base,
+ SCHEMA,
+ StringUtils.byteArrayToJsonString(
+ SchemaTranslation.toProto(target.getSchema()).toByteArray()));
+ return base;
+ }
+
+ /** Convert from a cloud object. */
+ @Override
+ public SchemaCoder fromCloudObject(CloudObject cloudObject) {
+ try {
+ SerializableFunction toRowFunction =
+ (SerializableFunction)
+ SerializableUtils.deserializeFromByteArray(
+ StringUtils.jsonStringToByteArray(
+ Structs.getString(cloudObject, TO_ROW_FUNCTION)),
+ "toRowFunction");
+ SerializableFunction fromRowFunction =
+ (SerializableFunction)
+ SerializableUtils.deserializeFromByteArray(
+ StringUtils.jsonStringToByteArray(
+ Structs.getString(cloudObject, FROM_ROW_FUNCTION)),
+ "fromRowFunction");
+ RunnerApi.Schema protoSchema =
+ RunnerApi.Schema.parseFrom(
+ StringUtils.jsonStringToByteArray(Structs.getString(cloudObject, SCHEMA)));
+ Schema schema = SchemaTranslation.fromProto(protoSchema);
+ return SchemaCoder.of(schema, toRowFunction, fromRowFunction);
+ } catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ @Override
+ public Class<? extends SchemaCoder> getSupportedClass() {
+ return SchemaCoder.class;
+ }
+
+ @Override
+ public String cloudObjectClassName() {
+ return CloudObject.forClass(SchemaCoder.class).getClassName();
+ }
+}
diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/BatchStatefulParDoOverridesTest.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/BatchStatefulParDoOverridesTest.java
index 88b5e4e..a13791c 100644
--- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/BatchStatefulParDoOverridesTest.java
+++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/BatchStatefulParDoOverridesTest.java
@@ -36,6 +36,8 @@
import org.apache.beam.sdk.Pipeline.PipelineVisitor;
import org.apache.beam.sdk.coders.VarIntCoder;
import org.apache.beam.sdk.extensions.gcp.auth.TestCredential;
+import org.apache.beam.sdk.extensions.gcp.util.GcsUtil;
+import org.apache.beam.sdk.extensions.gcp.util.gcsfs.GcsPath;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.runners.TransformHierarchy.Node;
import org.apache.beam.sdk.state.StateSpec;
@@ -44,8 +46,6 @@
import org.apache.beam.sdk.transforms.Create;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.ParDo;
-import org.apache.beam.sdk.util.GcsUtil;
-import org.apache.beam.sdk.util.gcsfs.GcsPath;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.values.TupleTag;
import org.apache.beam.sdk.values.TupleTagList;
diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineJobTest.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineJobTest.java
index b20fd9d..48faf23 100644
--- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineJobTest.java
+++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineJobTest.java
@@ -51,12 +51,12 @@
import org.apache.beam.sdk.PipelineResult.State;
import org.apache.beam.sdk.extensions.gcp.auth.TestCredential;
import org.apache.beam.sdk.extensions.gcp.storage.NoopPathValidator;
+import org.apache.beam.sdk.extensions.gcp.util.BackOffAdapter;
+import org.apache.beam.sdk.extensions.gcp.util.FastNanoClockAndSleeper;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.runners.AppliedPTransform;
import org.apache.beam.sdk.testing.ExpectedLogs;
import org.apache.beam.sdk.transforms.PTransform;
-import org.apache.beam.sdk.util.BackOffAdapter;
-import org.apache.beam.sdk.util.FastNanoClockAndSleeper;
import org.apache.beam.sdk.values.PInput;
import org.apache.beam.sdk.values.POutput;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableList;
diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslatorTest.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslatorTest.java
index ef68807..e027bae 100644
--- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslatorTest.java
+++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslatorTest.java
@@ -68,6 +68,8 @@
import org.apache.beam.sdk.coders.VarIntCoder;
import org.apache.beam.sdk.coders.VoidCoder;
import org.apache.beam.sdk.extensions.gcp.auth.TestCredential;
+import org.apache.beam.sdk.extensions.gcp.util.GcsUtil;
+import org.apache.beam.sdk.extensions.gcp.util.gcsfs.GcsPath;
import org.apache.beam.sdk.io.FileSystems;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.io.range.OffsetRange;
@@ -84,14 +86,12 @@
import org.apache.beam.sdk.transforms.Sum;
import org.apache.beam.sdk.transforms.View;
import org.apache.beam.sdk.transforms.display.DisplayData;
-import org.apache.beam.sdk.transforms.splittabledofn.OffsetRangeTracker;
+import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker;
import org.apache.beam.sdk.transforms.windowing.FixedWindows;
import org.apache.beam.sdk.transforms.windowing.Window;
import org.apache.beam.sdk.transforms.windowing.WindowFn;
import org.apache.beam.sdk.util.DoFnInfo;
-import org.apache.beam.sdk.util.GcsUtil;
import org.apache.beam.sdk.util.SerializableUtils;
-import org.apache.beam.sdk.util.gcsfs.GcsPath;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.values.PCollectionTuple;
@@ -730,12 +730,11 @@
assertEquals(SerializableCoder.of(OffsetRange.class), restrictionCoder);
}
- /** Smoke test to fail fast if translation of a splittable ParDo in streaming breaks. */
+ /** Smoke test to fail fast if translation of a splittable ParDo in FnAPI. */
@Test
- public void testStreamingSplittableParDoTranslationFnApi() throws Exception {
+ public void testSplittableParDoTranslationFnApi() throws Exception {
DataflowPipelineOptions options = buildPipelineOptions();
DataflowRunner runner = DataflowRunner.fromOptions(options);
- options.setStreaming(true);
options.setExperiments(Arrays.asList("beam_fn_api"));
DataflowPipelineTranslator translator = DataflowPipelineTranslator.fromOptions(options);
@@ -753,27 +752,27 @@
Job job = result.getJob();
- // The job should contain a SplittableParDo.ProcessKeyedElements step, translated as
- // "SplittableProcessKeyed".
+ // The job should contain a ParDo step, containing a "restriction_encoding".
List<Step> steps = job.getSteps();
- Step processKeyedStep = null;
+ Step splittableParDo = null;
for (Step step : steps) {
- if ("SplittableProcessKeyed".equals(step.getKind())) {
- assertNull(processKeyedStep);
- processKeyedStep = step;
+ if ("ParallelDo".equals(step.getKind())
+ && step.getProperties().containsKey(PropertyNames.RESTRICTION_ENCODING)) {
+ assertNull(splittableParDo);
+ splittableParDo = step;
}
}
- assertNotNull(processKeyedStep);
+ assertNotNull(splittableParDo);
- String fn = Structs.getString(processKeyedStep.getProperties(), PropertyNames.SERIALIZED_FN);
+ String fn = Structs.getString(splittableParDo.getProperties(), PropertyNames.SERIALIZED_FN);
Components componentsProto = result.getPipelineProto().getComponents();
RehydratedComponents components = RehydratedComponents.forComponents(componentsProto);
- RunnerApi.PTransform spkTransform = componentsProto.getTransformsOrThrow(fn);
+ RunnerApi.PTransform splittableTransform = componentsProto.getTransformsOrThrow(fn);
assertEquals(
- PTransformTranslation.SPLITTABLE_PROCESS_KEYED_URN, spkTransform.getSpec().getUrn());
- ParDoPayload payload = ParDoPayload.parseFrom(spkTransform.getSpec().getPayload());
+ PTransformTranslation.PAR_DO_TRANSFORM_URN, splittableTransform.getSpec().getUrn());
+ ParDoPayload payload = ParDoPayload.parseFrom(splittableTransform.getSpec().getPayload());
assertThat(
ParDoTranslation.doFnWithExecutionInformationFromProto(payload.getDoFn()).getDoFn(),
instanceOf(TestSplittableFn.class));
@@ -787,7 +786,7 @@
CloudObjects.coderFromCloudObject(
(CloudObject)
Structs.getObject(
- processKeyedStep.getProperties(), PropertyNames.RESTRICTION_CODER));
+ splittableParDo.getProperties(), PropertyNames.RESTRICTION_ENCODING));
assertEquals(SerializableCoder.of(OffsetRange.class), restrictionCoder);
}
@@ -975,7 +974,7 @@
private static class TestSplittableFn extends DoFn<String, Integer> {
@ProcessElement
- public void process(ProcessContext c, OffsetRangeTracker tracker) {
+ public void process(ProcessContext c, RestrictionTracker<OffsetRange, Long> tracker) {
// noop
}
diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowRunnerTest.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowRunnerTest.java
index 101d8d5..9214ecf 100644
--- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowRunnerTest.java
+++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowRunnerTest.java
@@ -86,6 +86,8 @@
import org.apache.beam.sdk.extensions.gcp.auth.NoopCredentialFactory;
import org.apache.beam.sdk.extensions.gcp.auth.TestCredential;
import org.apache.beam.sdk.extensions.gcp.storage.NoopPathValidator;
+import org.apache.beam.sdk.extensions.gcp.util.GcsUtil;
+import org.apache.beam.sdk.extensions.gcp.util.gcsfs.GcsPath;
import org.apache.beam.sdk.io.DynamicFileDestinations;
import org.apache.beam.sdk.io.FileBasedSink;
import org.apache.beam.sdk.io.FileSystems;
@@ -119,8 +121,6 @@
import org.apache.beam.sdk.transforms.windowing.PaneInfo;
import org.apache.beam.sdk.transforms.windowing.Sessions;
import org.apache.beam.sdk.transforms.windowing.Window;
-import org.apache.beam.sdk.util.GcsUtil;
-import org.apache.beam.sdk.util.gcsfs.GcsPath;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.values.PValue;
diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/TestDataflowRunnerTest.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/TestDataflowRunnerTest.java
index 1973706..24a5fe0 100644
--- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/TestDataflowRunnerTest.java
+++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/TestDataflowRunnerTest.java
@@ -48,13 +48,13 @@
import org.apache.beam.sdk.PipelineResult.State;
import org.apache.beam.sdk.extensions.gcp.auth.TestCredential;
import org.apache.beam.sdk.extensions.gcp.storage.NoopPathValidator;
+import org.apache.beam.sdk.extensions.gcp.util.Transport;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.testing.PAssert;
import org.apache.beam.sdk.testing.SerializableMatcher;
import org.apache.beam.sdk.testing.TestPipeline;
import org.apache.beam.sdk.testing.TestPipelineOptions;
import org.apache.beam.sdk.transforms.Create;
-import org.apache.beam.sdk.util.Transport;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.vendor.guava.v20_0.com.google.common.base.Optional;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableMap;
diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/CloudObjectsTest.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/CloudObjectsTest.java
index 0d65376..4be760b 100644
--- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/CloudObjectsTest.java
+++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/CloudObjectsTest.java
@@ -50,6 +50,8 @@
import org.apache.beam.sdk.coders.SetCoder;
import org.apache.beam.sdk.coders.StructuredCoder;
import org.apache.beam.sdk.coders.VarLongCoder;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.SchemaCoder;
import org.apache.beam.sdk.transforms.join.CoGbkResult.CoGbkResultCoder;
import org.apache.beam.sdk.transforms.join.CoGbkResultSchema;
import org.apache.beam.sdk.transforms.join.UnionCoder;
@@ -140,7 +142,8 @@
CoGbkResultCoder.of(
CoGbkResultSchema.of(
ImmutableList.of(new TupleTag<Long>(), new TupleTag<byte[]>())),
- UnionCoder.of(ImmutableList.of(VarLongCoder.of(), ByteArrayCoder.of()))));
+ UnionCoder.of(ImmutableList.of(VarLongCoder.of(), ByteArrayCoder.of()))))
+ .add(SchemaCoder.of(Schema.builder().build()));
for (Class<? extends Coder> atomicCoder :
DefaultCoderCloudObjectTranslatorRegistrar.KNOWN_ATOMIC_CODERS) {
dataBuilder.add(InstanceBuilder.ofType(atomicCoder).fromFactoryMethod("of").build());
diff --git a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/PackageUtilTest.java b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/PackageUtilTest.java
index 2e4d9cf..e21a5ab 100644
--- a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/PackageUtilTest.java
+++ b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/PackageUtilTest.java
@@ -68,6 +68,10 @@
import javax.annotation.Nullable;
import org.apache.beam.runners.dataflow.util.PackageUtil.PackageAttributes;
import org.apache.beam.sdk.extensions.gcp.options.GcsOptions;
+import org.apache.beam.sdk.extensions.gcp.util.FastNanoClockAndSleeper;
+import org.apache.beam.sdk.extensions.gcp.util.GcsUtil;
+import org.apache.beam.sdk.extensions.gcp.util.GcsUtil.StorageObjectOrIOException;
+import org.apache.beam.sdk.extensions.gcp.util.gcsfs.GcsPath;
import org.apache.beam.sdk.io.FileSystems;
import org.apache.beam.sdk.io.fs.CreateOptions;
import org.apache.beam.sdk.io.fs.CreateOptions.StandardCreateOptions;
@@ -75,11 +79,7 @@
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.testing.ExpectedLogs;
import org.apache.beam.sdk.testing.RegexMatcher;
-import org.apache.beam.sdk.util.FastNanoClockAndSleeper;
-import org.apache.beam.sdk.util.GcsUtil;
-import org.apache.beam.sdk.util.GcsUtil.StorageObjectOrIOException;
import org.apache.beam.sdk.util.MimeTypes;
-import org.apache.beam.sdk.util.gcsfs.GcsPath;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableList;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Iterables;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Lists;
diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowApiUtils.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowApiUtils.java
index ed3bf54..18bebd6 100644
--- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowApiUtils.java
+++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowApiUtils.java
@@ -22,7 +22,7 @@
import com.google.api.client.json.JsonGenerator;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
-import org.apache.beam.sdk.util.Transport;
+import org.apache.beam.sdk.extensions.gcp.util.Transport;
import org.apache.beam.vendor.guava.v20_0.com.google.common.io.ByteStreams;
import org.apache.beam.vendor.guava.v20_0.com.google.common.io.CountingOutputStream;
diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowWorkUnitClient.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowWorkUnitClient.java
index 3370856..b383b06 100644
--- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowWorkUnitClient.java
+++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowWorkUnitClient.java
@@ -42,7 +42,7 @@
import org.apache.beam.runners.dataflow.util.PropertyNames;
import org.apache.beam.runners.dataflow.worker.logging.DataflowWorkerLoggingMDC;
import org.apache.beam.runners.dataflow.worker.util.common.worker.WorkProgressUpdater;
-import org.apache.beam.sdk.util.Transport;
+import org.apache.beam.sdk.extensions.gcp.util.Transport;
import org.apache.beam.vendor.guava.v20_0.com.google.common.base.Optional;
import org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableList;
diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java
index cea383d..40584c13 100644
--- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java
+++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java
@@ -115,6 +115,7 @@
import org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub.GetWorkStream;
import org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub.StreamPool;
import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.extensions.gcp.util.Transport;
import org.apache.beam.sdk.fn.IdGenerator;
import org.apache.beam.sdk.fn.IdGenerators;
import org.apache.beam.sdk.io.FileSystems;
@@ -122,7 +123,6 @@
import org.apache.beam.sdk.util.BackOffUtils;
import org.apache.beam.sdk.util.FluentBackoff;
import org.apache.beam.sdk.util.Sleeper;
-import org.apache.beam.sdk.util.Transport;
import org.apache.beam.sdk.util.UserCodeException;
import org.apache.beam.sdk.util.WindowedValue.WindowedValueCoder;
import org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.ByteString;
diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/ElementCountMonitoringInfoToCounterUpdateTransformer.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/ElementCountMonitoringInfoToCounterUpdateTransformer.java
index cf4fc75..b09eaf9 100644
--- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/ElementCountMonitoringInfoToCounterUpdateTransformer.java
+++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/ElementCountMonitoringInfoToCounterUpdateTransformer.java
@@ -23,6 +23,7 @@
import java.util.Optional;
import javax.annotation.Nullable;
import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo;
+import org.apache.beam.runners.core.metrics.MonitoringInfoConstants;
import org.apache.beam.runners.core.metrics.SpecMonitoringInfoValidator;
import org.apache.beam.runners.dataflow.worker.counters.DataflowCounterUpdateExtractor;
import org.apache.beam.runners.dataflow.worker.counters.NameContext;
@@ -37,7 +38,7 @@
private final SpecMonitoringInfoValidator specValidator;
private final Map<String, NameContext> pcollectionIdToNameContext;
- private static final String SUPPORTED_URN = "beam:metric:element_count:v1";
+ private static final String SUPPORTED_URN = MonitoringInfoConstants.Urns.ELEMENT_COUNT;
/**
* @param specValidator SpecMonitoringInfoValidator to utilize for default validation.
@@ -68,8 +69,8 @@
throw new RuntimeException(String.format("Received unexpected counter urn: %s", urn));
}
- // todo(migryz): extract and utilize pcollection label from beam_fn_api.proto
- if (!pcollectionIdToNameContext.containsKey(monitoringInfo.getLabelsMap().get("PCOLLECTION"))) {
+ if (!pcollectionIdToNameContext.containsKey(
+ monitoringInfo.getLabelsMap().get(MonitoringInfoConstants.Labels.PCOLLECTION))) {
return Optional.of(
"Encountered ElementCount MonitoringInfo with unknown PCollectionId: "
+ monitoringInfo.toString());
@@ -95,7 +96,8 @@
long value = monitoringInfo.getMetric().getCounterData().getInt64Value();
- final String pcollectionId = monitoringInfo.getLabelsMap().get("PCOLLECTION");
+ final String pcollectionId =
+ monitoringInfo.getLabelsMap().get(MonitoringInfoConstants.Labels.PCOLLECTION);
final String pcollectionName = pcollectionIdToNameContext.get(pcollectionId).userName();
String counterName = pcollectionName + "-ElementCount";
diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/FnApiMonitoringInfoToCounterUpdateTransformer.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/FnApiMonitoringInfoToCounterUpdateTransformer.java
index 478a7a3..49058b6 100644
--- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/FnApiMonitoringInfoToCounterUpdateTransformer.java
+++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/FnApiMonitoringInfoToCounterUpdateTransformer.java
@@ -58,6 +58,10 @@
ElementCountMonitoringInfoToCounterUpdateTransformer.getSupportedUrn(),
new ElementCountMonitoringInfoToCounterUpdateTransformer(
specValidator, sdkPCollectionIdToNameContext));
+ this.counterTransformers.put(
+ MeanByteCountMonitoringInfoToCounterUpdateTransformer.getSupportedUrn(),
+ new MeanByteCountMonitoringInfoToCounterUpdateTransformer(
+ specValidator, sdkPCollectionIdToNameContext));
}
/** Allows for injection of user and generic counter transformers for more convenient testing. */
diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/MSecMonitoringInfoToCounterUpdateTransformer.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/MSecMonitoringInfoToCounterUpdateTransformer.java
index ffc8b5e..074c409 100644
--- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/MSecMonitoringInfoToCounterUpdateTransformer.java
+++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/MSecMonitoringInfoToCounterUpdateTransformer.java
@@ -26,6 +26,7 @@
import java.util.Optional;
import javax.annotation.Nullable;
import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo;
+import org.apache.beam.runners.core.metrics.MonitoringInfoConstants;
import org.apache.beam.runners.core.metrics.SpecMonitoringInfoValidator;
import org.apache.beam.runners.dataflow.worker.DataflowExecutionContext.DataflowStepContext;
import org.apache.beam.runners.dataflow.worker.counters.DataflowCounterUpdateExtractor;
@@ -73,9 +74,9 @@
@VisibleForTesting
protected Map<String, String> createKnownUrnToCounterNameMapping() {
Map<String, String> result = new HashMap<>();
- result.put("beam:metric:pardo_execution_time:start_bundle_msecs:v1", "start-msecs");
- result.put("beam:metric:pardo_execution_time:process_bundle_msecs:v1", "process-msecs");
- result.put("beam:metric:pardo_execution_time:finish_bundle_msecs:v1", "finish-msecs");
+ result.put(MonitoringInfoConstants.Urns.START_BUNDLE_MSECS, "start-msecs");
+ result.put(MonitoringInfoConstants.Urns.PROCESS_BUNDLE_MSECS, "process-msecs");
+ result.put(MonitoringInfoConstants.Urns.FINISH_BUNDLE_MSECS, "finish-msecs");
return result;
}
@@ -97,7 +98,8 @@
throw new RuntimeException(String.format("Received unexpected counter urn: %s", urn));
}
- final String ptransform = monitoringInfo.getLabelsMap().get("PTRANSFORM");
+ final String ptransform =
+ monitoringInfo.getLabelsMap().get(MonitoringInfoConstants.Labels.PTRANSFORM);
DataflowStepContext stepContext = transformIdMapping.get(ptransform);
if (stepContext == null) {
return Optional.of(
@@ -120,7 +122,8 @@
long value = monitoringInfo.getMetric().getCounterData().getInt64Value();
String urn = monitoringInfo.getUrn();
- final String ptransform = monitoringInfo.getLabelsMap().get("PTRANSFORM");
+ final String ptransform =
+ monitoringInfo.getLabelsMap().get(MonitoringInfoConstants.Labels.PTRANSFORM);
DataflowStepContext stepContext = transformIdMapping.get(ptransform);
String counterName = urnToCounterNameMapping.get(urn);
diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/MeanByteCountMonitoringInfoToCounterUpdateTransformer.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/MeanByteCountMonitoringInfoToCounterUpdateTransformer.java
new file mode 100644
index 0000000..7966ef9
--- /dev/null
+++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/MeanByteCountMonitoringInfoToCounterUpdateTransformer.java
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.dataflow.worker.fn.control;
+
+import static org.apache.beam.runners.dataflow.worker.counters.DataflowCounterUpdateExtractor.longToSplitInt;
+
+import com.google.api.services.dataflow.model.CounterUpdate;
+import com.google.api.services.dataflow.model.IntegerMean;
+import com.google.api.services.dataflow.model.NameAndKind;
+import java.util.Map;
+import java.util.Optional;
+import javax.annotation.Nullable;
+import org.apache.beam.model.pipeline.v1.MetricsApi.IntDistributionData;
+import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo;
+import org.apache.beam.runners.core.metrics.MonitoringInfoConstants;
+import org.apache.beam.runners.core.metrics.SpecMonitoringInfoValidator;
+import org.apache.beam.runners.dataflow.worker.counters.NameContext;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** MonitoringInfo to CounterUpdate transformer capable to transform MeanByteCount counter. */
+public class MeanByteCountMonitoringInfoToCounterUpdateTransformer
+ implements MonitoringInfoToCounterUpdateTransformer {
+
+ private static final Logger LOG = LoggerFactory.getLogger(BeamFnMapTaskExecutor.class);
+
+ private final SpecMonitoringInfoValidator specValidator;
+ private final Map<String, NameContext> pcollectionIdToNameContext;
+
+ // TODO(BEAM-6945): utilize value from metrics.proto once it gets in.
+ private static final String SUPPORTED_URN = "beam:metric:sampled_byte_size:v1";
+
+ /**
+ * @param specValidator SpecMonitoringInfoValidator to utilize for default validation.
+ * @param pcollectionIdToNameContext This mapping is utilized to generate DFE CounterUpdate name.
+ */
+ public MeanByteCountMonitoringInfoToCounterUpdateTransformer(
+ SpecMonitoringInfoValidator specValidator,
+ Map<String, NameContext> pcollectionIdToNameContext) {
+ this.specValidator = specValidator;
+ this.pcollectionIdToNameContext = pcollectionIdToNameContext;
+ }
+
+ /**
+ * Validates provided monitoring info against specs and common safety checks.
+ *
+ * @param monitoringInfo to validate.
+ * @return Optional.empty() all validation checks are passed. Optional with error text otherwise.
+ * @throws RuntimeException if received unexpected urn.
+ */
+ protected Optional<String> validate(MonitoringInfo monitoringInfo) {
+ Optional<String> validatorResult = specValidator.validate(monitoringInfo);
+ if (validatorResult.isPresent()) {
+ return validatorResult;
+ }
+
+ String urn = monitoringInfo.getUrn();
+ if (!urn.equals(SUPPORTED_URN)) {
+ throw new RuntimeException(String.format("Received unexpected counter urn: %s", urn));
+ }
+
+ // TODO(migryz): extract and utilize pcollection label from beam_fn_api.proto
+ if (!pcollectionIdToNameContext.containsKey(
+ monitoringInfo.getLabelsMap().get(MonitoringInfoConstants.Labels.PCOLLECTION))) {
+ return Optional.of(
+ "Encountered ElementCount MonitoringInfo with unknown PCollectionId: "
+ + monitoringInfo.toString());
+ }
+
+ return Optional.empty();
+ }
+
+ /**
+ * Generates CounterUpdate to send to DFE based on ElementCount MonitoringInfo.
+ *
+ * @param monitoringInfo Monitoring info to transform.
+ * @return CounterUpdate generated based on provided monitoringInfo
+ */
+ @Override
+ @Nullable
+ public CounterUpdate transform(MonitoringInfo monitoringInfo) {
+ Optional<String> validationResult = validate(monitoringInfo);
+ if (validationResult.isPresent()) {
+ LOG.info(validationResult.get());
+ return null;
+ }
+
+ IntDistributionData value =
+ monitoringInfo.getMetric().getDistributionData().getIntDistributionData();
+
+ final String pcollectionId =
+ monitoringInfo.getLabelsMap().get(MonitoringInfoConstants.Labels.PCOLLECTION);
+ final String pcollectionName = pcollectionIdToNameContext.get(pcollectionId).userName();
+
+ String counterName = pcollectionName + "-MeanByteCount";
+ NameAndKind name = new NameAndKind();
+ name.setName(counterName).setKind("MEAN");
+
+ return new CounterUpdate()
+ .setNameAndKind(name)
+ .setCumulative(true)
+ .setIntegerMean(
+ new IntegerMean()
+ .setSum(longToSplitInt(value.getSum()))
+ .setCount(longToSplitInt(value.getCount())));
+ }
+
+ /** @return iterable of Urns that this transformer can convert to CounterUpdates. */
+ public static String getSupportedUrn() {
+ return SUPPORTED_URN;
+ }
+}
diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/UserDistributionMonitoringInfoToCounterUpdateTransformer.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/UserDistributionMonitoringInfoToCounterUpdateTransformer.java
index 9ca66a0..ab0c29f 100644
--- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/UserDistributionMonitoringInfoToCounterUpdateTransformer.java
+++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/UserDistributionMonitoringInfoToCounterUpdateTransformer.java
@@ -17,8 +17,6 @@
*/
package org.apache.beam.runners.dataflow.worker.fn.control;
-import static org.apache.beam.model.pipeline.v1.MetricsApi.monitoringInfoSpec;
-
import com.google.api.services.dataflow.model.CounterMetadata;
import com.google.api.services.dataflow.model.CounterStructuredName;
import com.google.api.services.dataflow.model.CounterStructuredNameAndMetadata;
@@ -29,7 +27,7 @@
import javax.annotation.Nullable;
import org.apache.beam.model.pipeline.v1.MetricsApi.IntDistributionData;
import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo;
-import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfoSpecs.Enum;
+import org.apache.beam.runners.core.metrics.MonitoringInfoConstants;
import org.apache.beam.runners.core.metrics.SpecMonitoringInfoValidator;
import org.apache.beam.runners.dataflow.worker.DataflowExecutionContext.DataflowStepContext;
import org.apache.beam.runners.dataflow.worker.MetricsToCounterUpdateConverter.Origin;
@@ -58,11 +56,7 @@
}
static final String BEAM_METRICS_USER_DISTRIBUTION_PREFIX =
- Enum.USER_DISTRIBUTION_COUNTER
- .getValueDescriptor()
- .getOptions()
- .getExtension(monitoringInfoSpec)
- .getUrn();
+ MonitoringInfoConstants.Urns.USER_DISTRIBUTION_COUNTER_PREFIX;
private Optional<String> validate(MonitoringInfo monitoringInfo) {
Optional<String> validatorResult = specValidator.validate(monitoringInfo);
@@ -78,7 +72,8 @@
BEAM_METRICS_USER_DISTRIBUTION_PREFIX, urn));
}
- final String ptransform = monitoringInfo.getLabelsMap().get("PTRANSFORM");
+ final String ptransform =
+ monitoringInfo.getLabelsMap().get(MonitoringInfoConstants.Labels.PTRANSFORM);
DataflowStepContext stepContext = transformIdMapping.get(ptransform);
if (stepContext == null) {
return Optional.of(
@@ -106,7 +101,8 @@
monitoringInfo.getMetric().getDistributionData().getIntDistributionData();
String urn = monitoringInfo.getUrn();
- final String ptransform = monitoringInfo.getLabelsMap().get("PTRANSFORM");
+ final String ptransform =
+ monitoringInfo.getLabelsMap().get(MonitoringInfoConstants.Labels.PTRANSFORM);
CounterStructuredNameAndMetadata name = new CounterStructuredNameAndMetadata();
diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/UserMonitoringInfoToCounterUpdateTransformer.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/UserMonitoringInfoToCounterUpdateTransformer.java
index 93c67ab..0dfaee0 100644
--- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/UserMonitoringInfoToCounterUpdateTransformer.java
+++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/UserMonitoringInfoToCounterUpdateTransformer.java
@@ -17,8 +17,6 @@
*/
package org.apache.beam.runners.dataflow.worker.fn.control;
-import static org.apache.beam.model.pipeline.v1.MetricsApi.monitoringInfoSpec;
-
import com.google.api.services.dataflow.model.CounterMetadata;
import com.google.api.services.dataflow.model.CounterStructuredName;
import com.google.api.services.dataflow.model.CounterStructuredNameAndMetadata;
@@ -27,7 +25,7 @@
import java.util.Optional;
import javax.annotation.Nullable;
import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo;
-import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfoSpecs.Enum;
+import org.apache.beam.runners.core.metrics.MonitoringInfoConstants;
import org.apache.beam.runners.core.metrics.SpecMonitoringInfoValidator;
import org.apache.beam.runners.dataflow.worker.DataflowExecutionContext.DataflowStepContext;
import org.apache.beam.runners.dataflow.worker.MetricsToCounterUpdateConverter.Origin;
@@ -55,8 +53,7 @@
this.specValidator = specMonitoringInfoValidator;
}
- static final String BEAM_METRICS_USER_PREFIX =
- Enum.USER_COUNTER.getValueDescriptor().getOptions().getExtension(monitoringInfoSpec).getUrn();
+ static final String BEAM_METRICS_USER_PREFIX = MonitoringInfoConstants.Urns.USER_COUNTER_PREFIX;
private Optional<String> validate(MonitoringInfo monitoringInfo) {
Optional<String> validatorResult = specValidator.validate(monitoringInfo);
@@ -72,7 +69,8 @@
BEAM_METRICS_USER_PREFIX, urn));
}
- final String ptransform = monitoringInfo.getLabelsMap().get("PTRANSFORM");
+ final String ptransform =
+ monitoringInfo.getLabelsMap().get(MonitoringInfoConstants.Labels.PTRANSFORM);
DataflowStepContext stepContext = transformIdMapping.get(ptransform);
if (stepContext == null) {
return Optional.of(
@@ -99,7 +97,8 @@
long value = monitoringInfo.getMetric().getCounterData().getInt64Value();
String urn = monitoringInfo.getUrn();
- final String ptransform = monitoringInfo.getLabelsMap().get("PTRANSFORM");
+ final String ptransform =
+ monitoringInfo.getLabelsMap().get(MonitoringInfoConstants.Labels.PTRANSFORM);
CounterStructuredNameAndMetadata name = new CounterStructuredNameAndMetadata();
diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/graph/MapTaskToNetworkFunction.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/graph/MapTaskToNetworkFunction.java
index 64d824f..e08240a 100644
--- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/graph/MapTaskToNetworkFunction.java
+++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/graph/MapTaskToNetworkFunction.java
@@ -33,8 +33,8 @@
import org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode;
import org.apache.beam.runners.dataflow.worker.graph.Nodes.Node;
import org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode;
+import org.apache.beam.sdk.extensions.gcp.util.Transport;
import org.apache.beam.sdk.fn.IdGenerator;
-import org.apache.beam.sdk.util.Transport;
import org.apache.beam.vendor.guava.v20_0.com.google.common.base.MoreObjects;
import org.apache.beam.vendor.guava.v20_0.com.google.common.graph.MutableNetwork;
import org.apache.beam.vendor.guava.v20_0.com.google.common.graph.NetworkBuilder;
diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/graph/Nodes.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/graph/Nodes.java
index 139b274..e65b107 100644
--- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/graph/Nodes.java
+++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/graph/Nodes.java
@@ -39,7 +39,7 @@
import org.apache.beam.runners.dataflow.worker.util.common.worker.Operation;
import org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver;
import org.apache.beam.sdk.coders.Coder;
-import org.apache.beam.sdk.util.Transport;
+import org.apache.beam.sdk.extensions.gcp.util.Transport;
import org.apache.beam.sdk.values.PCollectionView;
import org.apache.beam.sdk.values.WindowingStrategy;
import org.apache.beam.vendor.guava.v20_0.com.google.common.base.MoreObjects;
diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/profiler/ScopedProfiler.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/profiler/ScopedProfiler.java
index 27419af..0d4ac47 100644
--- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/profiler/ScopedProfiler.java
+++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/profiler/ScopedProfiler.java
@@ -135,11 +135,11 @@
// If we make it here, then we successfully invoked the above method, which means the profiler
// is available.
- LOG.warn("Profiling Agent found. Per-step profiling is enabled.");
+ LOG.info("Profiling Agent found. Per-step profiling is enabled.");
return ProfilingState.PROFILING_PRESENT;
} catch (UnsatisfiedLinkError e) {
// If we make it here, then the profiling agent wasn't linked in.
- LOG.warn("Profiling Agent not found. Profiles will not be available from this worker.");
+ LOG.info("Profiling Agent not found. Profiles will not be available from this worker.");
return ProfilingState.PROFILING_ABSENT;
}
}
diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/BatchDataflowWorkerTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/BatchDataflowWorkerTest.java
index 89ab688..4fc485f 100644
--- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/BatchDataflowWorkerTest.java
+++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/BatchDataflowWorkerTest.java
@@ -40,8 +40,8 @@
import java.util.ArrayList;
import org.apache.beam.runners.dataflow.options.DataflowWorkerHarnessOptions;
import org.apache.beam.runners.dataflow.util.TimeUtil;
+import org.apache.beam.sdk.extensions.gcp.util.FastNanoClockAndSleeper;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
-import org.apache.beam.sdk.util.FastNanoClockAndSleeper;
import org.apache.beam.vendor.guava.v20_0.com.google.common.base.Optional;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableList;
import org.hamcrest.Description;
diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowBatchWorkerHarnessTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowBatchWorkerHarnessTest.java
index 975d602..3682545 100644
--- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowBatchWorkerHarnessTest.java
+++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowBatchWorkerHarnessTest.java
@@ -30,10 +30,10 @@
import org.apache.beam.runners.dataflow.options.DataflowWorkerHarnessOptions;
import org.apache.beam.runners.dataflow.worker.testing.RestoreDataflowLoggingMDC;
import org.apache.beam.sdk.extensions.gcp.auth.TestCredential;
+import org.apache.beam.sdk.extensions.gcp.util.FastNanoClockAndSleeper;
+import org.apache.beam.sdk.extensions.gcp.util.Transport;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.testing.RestoreSystemProperties;
-import org.apache.beam.sdk.util.FastNanoClockAndSleeper;
-import org.apache.beam.sdk.util.Transport;
import org.junit.Before;
import org.junit.Rule;
import org.junit.Test;
diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowWorkProgressUpdaterTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowWorkProgressUpdaterTest.java
index 8c1b7b5..648c578 100644
--- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowWorkProgressUpdaterTest.java
+++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowWorkProgressUpdaterTest.java
@@ -40,7 +40,7 @@
import org.apache.beam.runners.dataflow.worker.util.common.worker.NativeReader.DynamicSplitRequest;
import org.apache.beam.runners.dataflow.worker.util.common.worker.NativeReader.DynamicSplitResult;
import org.apache.beam.runners.dataflow.worker.util.common.worker.StubbedExecutor;
-import org.apache.beam.sdk.util.Transport;
+import org.apache.beam.sdk.extensions.gcp.util.Transport;
import org.joda.time.Duration;
import org.joda.time.Instant;
import org.junit.Before;
diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowWorkUnitClientTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowWorkUnitClientTest.java
index 5bf3144..44c6319 100644
--- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowWorkUnitClientTest.java
+++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/DataflowWorkUnitClientTest.java
@@ -38,10 +38,10 @@
import org.apache.beam.runners.dataflow.worker.logging.DataflowWorkerLoggingMDC;
import org.apache.beam.runners.dataflow.worker.testing.RestoreDataflowLoggingMDC;
import org.apache.beam.sdk.extensions.gcp.auth.TestCredential;
+import org.apache.beam.sdk.extensions.gcp.util.FastNanoClockAndSleeper;
+import org.apache.beam.sdk.extensions.gcp.util.Transport;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.testing.RestoreSystemProperties;
-import org.apache.beam.sdk.util.FastNanoClockAndSleeper;
-import org.apache.beam.sdk.util.Transport;
import org.apache.beam.vendor.guava.v20_0.com.google.common.base.Optional;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableList;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Lists;
diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IntrinsicMapTaskExecutorFactoryTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IntrinsicMapTaskExecutorFactoryTest.java
index 899fc59..cad7595 100644
--- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IntrinsicMapTaskExecutorFactoryTest.java
+++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IntrinsicMapTaskExecutorFactoryTest.java
@@ -82,6 +82,7 @@
import org.apache.beam.sdk.coders.IterableCoder;
import org.apache.beam.sdk.coders.KvCoder;
import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.extensions.gcp.util.Transport;
import org.apache.beam.sdk.fn.IdGenerator;
import org.apache.beam.sdk.fn.IdGenerators;
import org.apache.beam.sdk.options.PipelineOptions;
@@ -94,7 +95,6 @@
import org.apache.beam.sdk.util.DoFnInfo;
import org.apache.beam.sdk.util.SerializableUtils;
import org.apache.beam.sdk.util.StringUtils;
-import org.apache.beam.sdk.util.Transport;
import org.apache.beam.sdk.util.WindowedValue;
import org.apache.beam.sdk.util.WindowedValue.FullWindowedValueCoder;
import org.apache.beam.sdk.values.TupleTag;
diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorkerTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorkerTest.java
index 5e8bbc8..3cff21a 100644
--- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorkerTest.java
+++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorkerTest.java
@@ -102,6 +102,7 @@
import org.apache.beam.sdk.coders.ListCoder;
import org.apache.beam.sdk.coders.StringUtf8Coder;
import org.apache.beam.sdk.coders.VarIntCoder;
+import org.apache.beam.sdk.extensions.gcp.util.Transport;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.state.StateSpec;
import org.apache.beam.sdk.state.StateSpecs;
@@ -125,7 +126,6 @@
import org.apache.beam.sdk.util.DoFnInfo;
import org.apache.beam.sdk.util.SerializableUtils;
import org.apache.beam.sdk.util.StringUtils;
-import org.apache.beam.sdk.util.Transport;
import org.apache.beam.sdk.util.VarInt;
import org.apache.beam.sdk.util.WindowedValue;
import org.apache.beam.sdk.util.WindowedValue.FullWindowedValueCoder;
diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutorTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutorTest.java
index 38e3d7a..248b9e6 100644
--- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutorTest.java
+++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutorTest.java
@@ -45,6 +45,7 @@
import org.apache.beam.runners.core.TimerInternals;
import org.apache.beam.runners.core.TimerInternals.TimerData;
import org.apache.beam.runners.core.metrics.ExecutionStateTracker;
+import org.apache.beam.runners.core.metrics.MonitoringInfoConstants;
import org.apache.beam.runners.dataflow.worker.DataflowExecutionContext.DataflowStepContext;
import org.apache.beam.runners.dataflow.worker.counters.NameContext;
import org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation;
@@ -463,7 +464,7 @@
MonitoringInfo.newBuilder()
.setUrn("beam:metric:user:ExpectedCounter")
.setType("beam:metrics:sum_int_64")
- .putLabels("PTRANSFORM", "ExpectedPTransform")
+ .putLabels(MonitoringInfoConstants.Labels.PTRANSFORM, "ExpectedPTransform")
.setMetric(
Metric.newBuilder()
.setCounterData(
diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/ElementCountMonitoringInfoToCounterUpdateTransformerTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/ElementCountMonitoringInfoToCounterUpdateTransformerTest.java
index a45463b..24b2491 100644
--- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/ElementCountMonitoringInfoToCounterUpdateTransformerTest.java
+++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/ElementCountMonitoringInfoToCounterUpdateTransformerTest.java
@@ -27,6 +27,7 @@
import java.util.Map;
import java.util.Optional;
import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo;
+import org.apache.beam.runners.core.metrics.MonitoringInfoConstants;
import org.apache.beam.runners.core.metrics.SpecMonitoringInfoValidator;
import org.apache.beam.runners.dataflow.worker.counters.NameContext;
import org.junit.Before;
@@ -78,7 +79,7 @@
MonitoringInfo monitoringInfo =
MonitoringInfo.newBuilder()
.setUrn("beam:metric:element_count:v1")
- .putLabels("PCOLLECTION", "anyValue")
+ .putLabels(MonitoringInfoConstants.Labels.PCOLLECTION, "anyValue")
.build();
ElementCountMonitoringInfoToCounterUpdateTransformer testObject =
new ElementCountMonitoringInfoToCounterUpdateTransformer(
@@ -97,7 +98,7 @@
MonitoringInfo monitoringInfo =
MonitoringInfo.newBuilder()
.setUrn("beam:metric:element_count:v1")
- .putLabels("PCOLLECTION", "anyValue")
+ .putLabels(MonitoringInfoConstants.Labels.PCOLLECTION, "anyValue")
.build();
ElementCountMonitoringInfoToCounterUpdateTransformer testObject =
new ElementCountMonitoringInfoToCounterUpdateTransformer(
diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/FnApiMonitoringInfoToCounterUpdateTransformerTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/FnApiMonitoringInfoToCounterUpdateTransformerTest.java
index 3635c6c..1a0f3b1 100644
--- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/FnApiMonitoringInfoToCounterUpdateTransformerTest.java
+++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/FnApiMonitoringInfoToCounterUpdateTransformerTest.java
@@ -26,6 +26,7 @@
import java.util.HashMap;
import java.util.Map;
import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo;
+import org.apache.beam.runners.core.metrics.MonitoringInfoConstants;
import org.junit.Before;
import org.junit.Test;
import org.mockito.Mock;
@@ -63,7 +64,7 @@
MonitoringInfo monitoringInfo =
MonitoringInfo.newBuilder()
.setUrn("user:prefix:anyNamespace:anyName")
- .putLabels("PTRANSFORM", "anyValue")
+ .putLabels(MonitoringInfoConstants.Labels.PTRANSFORM, "anyValue")
.build();
CounterUpdate result = testObject.transform(monitoringInfo);
@@ -91,7 +92,10 @@
when(mockGenericTransformer1.transform(any())).thenReturn(expectedResult);
MonitoringInfo monitoringInfo =
- MonitoringInfo.newBuilder().setUrn(validUrn).putLabels("PTRANSFORM", "anyValue").build();
+ MonitoringInfo.newBuilder()
+ .setUrn(validUrn)
+ .putLabels(MonitoringInfoConstants.Labels.PTRANSFORM, "anyValue")
+ .build();
CounterUpdate result = testObject.transform(monitoringInfo);
diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/MSecMonitoringInfoToCounterUpdateTransformerTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/MSecMonitoringInfoToCounterUpdateTransformerTest.java
index c221cee..0dbedb7 100644
--- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/MSecMonitoringInfoToCounterUpdateTransformerTest.java
+++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/MSecMonitoringInfoToCounterUpdateTransformerTest.java
@@ -28,6 +28,7 @@
import java.util.Map;
import java.util.Optional;
import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo;
+import org.apache.beam.runners.core.metrics.MonitoringInfoConstants;
import org.apache.beam.runners.core.metrics.SpecMonitoringInfoValidator;
import org.apache.beam.runners.dataflow.worker.DataflowExecutionContext.DataflowStepContext;
import org.apache.beam.runners.dataflow.worker.counters.NameContext;
@@ -106,7 +107,7 @@
MonitoringInfo monitoringInfo =
MonitoringInfo.newBuilder()
.setUrn("beam:counter:unsupported")
- .putLabels("PTRANSFORM", "anyValue")
+ .putLabels(MonitoringInfoConstants.Labels.PTRANSFORM, "anyValue")
.build();
exception.expect(RuntimeException.class);
@@ -135,7 +136,7 @@
MonitoringInfo monitoringInfo =
MonitoringInfo.newBuilder()
.setUrn("beam:counter:supported")
- .putLabels("PTRANSFORM", "anyValue")
+ .putLabels(MonitoringInfoConstants.Labels.PTRANSFORM, "anyValue")
.build();
CounterUpdate result = testObject.transform(monitoringInfo);
diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/MeanByteCountMonitoringInfoToCounterUpdateTransformerTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/MeanByteCountMonitoringInfoToCounterUpdateTransformerTest.java
new file mode 100644
index 0000000..d554b07
--- /dev/null
+++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/MeanByteCountMonitoringInfoToCounterUpdateTransformerTest.java
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.dataflow.worker.fn.control;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotEquals;
+import static org.mockito.Matchers.any;
+import static org.mockito.Mockito.when;
+
+import com.google.api.services.dataflow.model.CounterUpdate;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo;
+import org.apache.beam.runners.core.metrics.MonitoringInfoConstants;
+import org.apache.beam.runners.core.metrics.SpecMonitoringInfoValidator;
+import org.apache.beam.runners.dataflow.worker.counters.NameContext;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.ExpectedException;
+import org.mockito.Mock;
+import org.mockito.MockitoAnnotations;
+
+public class MeanByteCountMonitoringInfoToCounterUpdateTransformerTest {
+
+ @Rule public final ExpectedException exception = ExpectedException.none();
+
+ @Mock private SpecMonitoringInfoValidator mockSpecValidator;
+
+ @Before
+ public void setUp() throws Exception {
+ MockitoAnnotations.initMocks(this);
+ }
+
+ @Test
+ public void tesTransformReturnsNullIfSpecValidationFails() {
+ Map<String, NameContext> pcollectionNameMapping = new HashMap<>();
+ ElementCountMonitoringInfoToCounterUpdateTransformer testObject =
+ new ElementCountMonitoringInfoToCounterUpdateTransformer(
+ mockSpecValidator, pcollectionNameMapping);
+ Optional<String> error = Optional.of("Error text");
+ when(mockSpecValidator.validate(any())).thenReturn(error);
+ assertEquals(null, testObject.transform(null));
+ }
+
+ @Test
+ public void testTransformThrowsIfMonitoringInfoWithWrongUrnReceived() {
+ Map<String, NameContext> pcollectionNameMapping = new HashMap<>();
+ MonitoringInfo monitoringInfo =
+ MonitoringInfo.newBuilder().setUrn("beam:user:metric:element_count:v1").build();
+ MeanByteCountMonitoringInfoToCounterUpdateTransformer testObject =
+ new MeanByteCountMonitoringInfoToCounterUpdateTransformer(
+ mockSpecValidator, pcollectionNameMapping);
+ when(mockSpecValidator.validate(any())).thenReturn(Optional.empty());
+
+ exception.expect(RuntimeException.class);
+ testObject.transform(monitoringInfo);
+ }
+
+ @Test
+ public void testTransformReturnsNullIfMonitoringInfoWithUnknownPCollectionLabelPresent() {
+ Map<String, NameContext> pcollectionNameMapping = new HashMap<>();
+ MonitoringInfo monitoringInfo =
+ MonitoringInfo.newBuilder()
+ .setUrn("beam:metric:sampled_byte_size:v1")
+ .putLabels(MonitoringInfoConstants.Labels.PCOLLECTION, "anyValue")
+ .build();
+ MeanByteCountMonitoringInfoToCounterUpdateTransformer testObject =
+ new MeanByteCountMonitoringInfoToCounterUpdateTransformer(
+ mockSpecValidator, pcollectionNameMapping);
+ when(mockSpecValidator.validate(any())).thenReturn(Optional.empty());
+ assertEquals(null, testObject.transform(monitoringInfo));
+ }
+
+ @Test
+ public void testTransformReturnsValidCounterUpdateWhenValidMonitoringInfoReceived() {
+ Map<String, NameContext> pcollectionNameMapping = new HashMap<>();
+ pcollectionNameMapping.put(
+ "anyValue",
+ NameContext.create("anyStageName", "anyOriginName", "anySystemName", "transformedValue"));
+
+ MonitoringInfo monitoringInfo =
+ MonitoringInfo.newBuilder()
+ .setUrn("beam:metric:sampled_byte_size:v1")
+ .putLabels(MonitoringInfoConstants.Labels.PCOLLECTION, "anyValue")
+ .build();
+ MeanByteCountMonitoringInfoToCounterUpdateTransformer testObject =
+ new MeanByteCountMonitoringInfoToCounterUpdateTransformer(
+ mockSpecValidator, pcollectionNameMapping);
+ when(mockSpecValidator.validate(any())).thenReturn(Optional.empty());
+
+ CounterUpdate result = testObject.transform(monitoringInfo);
+
+ assertNotEquals(null, result);
+ assertEquals(
+ "{cumulative=true, integerMean={count={highBits=0, lowBits=0}, "
+ + "sum={highBits=0, lowBits=0}}, "
+ + "nameAndKind={kind=MEAN, "
+ + "name=transformedValue-MeanByteCount}}",
+ result.toString());
+ }
+}
diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/UserDistributionMonitoringInfoToCounterUpdateTransformerTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/UserDistributionMonitoringInfoToCounterUpdateTransformerTest.java
index b7f0100..203ae8b 100644
--- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/UserDistributionMonitoringInfoToCounterUpdateTransformerTest.java
+++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/UserDistributionMonitoringInfoToCounterUpdateTransformerTest.java
@@ -28,6 +28,7 @@
import java.util.Map;
import java.util.Optional;
import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo;
+import org.apache.beam.runners.core.metrics.MonitoringInfoConstants;
import org.apache.beam.runners.core.metrics.SpecMonitoringInfoValidator;
import org.apache.beam.runners.dataflow.worker.DataflowExecutionContext.DataflowStepContext;
import org.apache.beam.runners.dataflow.worker.counters.NameContext;
@@ -80,7 +81,7 @@
MonitoringInfo monitoringInfo =
MonitoringInfo.newBuilder()
.setUrn("beam:metric:user_distribution:anyNamespace:anyName")
- .putLabels("PTRANSFORM", "anyValue")
+ .putLabels(MonitoringInfoConstants.Labels.PTRANSFORM, "anyValue")
.build();
UserDistributionMonitoringInfoToCounterUpdateTransformer testObject =
new UserDistributionMonitoringInfoToCounterUpdateTransformer(
@@ -101,7 +102,7 @@
MonitoringInfo monitoringInfo =
MonitoringInfo.newBuilder()
.setUrn("beam:metric:user_distribution:anyNamespace:anyName")
- .putLabels("PTRANSFORM", "anyValue")
+ .putLabels(MonitoringInfoConstants.Labels.PTRANSFORM, "anyValue")
.build();
UserDistributionMonitoringInfoToCounterUpdateTransformer testObject =
new UserDistributionMonitoringInfoToCounterUpdateTransformer(
diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/UserMonitoringInfoToCounterUpdateTransformerTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/UserMonitoringInfoToCounterUpdateTransformerTest.java
index a401c90..6d65e1b 100644
--- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/UserMonitoringInfoToCounterUpdateTransformerTest.java
+++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/control/UserMonitoringInfoToCounterUpdateTransformerTest.java
@@ -28,6 +28,7 @@
import java.util.Map;
import java.util.Optional;
import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo;
+import org.apache.beam.runners.core.metrics.MonitoringInfoConstants;
import org.apache.beam.runners.core.metrics.SpecMonitoringInfoValidator;
import org.apache.beam.runners.dataflow.worker.DataflowExecutionContext.DataflowStepContext;
import org.apache.beam.runners.dataflow.worker.counters.NameContext;
@@ -78,7 +79,7 @@
MonitoringInfo monitoringInfo =
MonitoringInfo.newBuilder()
.setUrn("beam:metric:user:anyNamespace:anyName")
- .putLabels("PTRANSFORM", "anyValue")
+ .putLabels(MonitoringInfoConstants.Labels.PTRANSFORM, "anyValue")
.build();
UserMonitoringInfoToCounterUpdateTransformer testObject =
new UserMonitoringInfoToCounterUpdateTransformer(mockSpecValidator, stepContextMapping);
@@ -98,7 +99,7 @@
MonitoringInfo monitoringInfo =
MonitoringInfo.newBuilder()
.setUrn("beam:metric:user:anyNamespace:anyName")
- .putLabels("PTRANSFORM", "anyValue")
+ .putLabels(MonitoringInfoConstants.Labels.PTRANSFORM, "anyValue")
.build();
UserMonitoringInfoToCounterUpdateTransformer testObject =
new UserMonitoringInfoToCounterUpdateTransformer(mockSpecValidator, stepContextMapping);
diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/MapTaskToNetworkFunctionTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/MapTaskToNetworkFunctionTest.java
index aefedfb..b3ced60 100644
--- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/MapTaskToNetworkFunctionTest.java
+++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/graph/MapTaskToNetworkFunctionTest.java
@@ -44,8 +44,8 @@
import org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode;
import org.apache.beam.runners.dataflow.worker.graph.Nodes.Node;
import org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode;
+import org.apache.beam.sdk.extensions.gcp.util.Transport;
import org.apache.beam.sdk.fn.IdGenerators;
-import org.apache.beam.sdk.util.Transport;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableList;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Iterables;
import org.apache.beam.vendor.guava.v20_0.com.google.common.graph.Network;
diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/JobServerDriver.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/JobServerDriver.java
index daa9b56..2724910 100644
--- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/JobServerDriver.java
+++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/JobServerDriver.java
@@ -27,6 +27,7 @@
import org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactStagingService;
import org.apache.beam.vendor.guava.v20_0.com.google.common.annotations.VisibleForTesting;
import org.kohsuke.args4j.Option;
+import org.kohsuke.args4j.spi.ExplicitBooleanOptionHandler;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
@@ -88,8 +89,10 @@
@Option(
name = "--clean-artifacts-per-job",
- usage = "When true, remove each job's staged artifacts when it completes")
- private boolean cleanArtifactsPerJob = false;
+ usage = "When true, remove each job's staged artifacts when it completes",
+ // Allows setting boolean parameters to false which default to true
+ handler = ExplicitBooleanOptionHandler.class)
+ private boolean cleanArtifactsPerJob = true;
@Option(
name = "--sdk-worker-parallelism",
diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/translation/PipelineTranslatorUtils.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/translation/PipelineTranslatorUtils.java
index c39b8b7..a0ea37e 100644
--- a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/translation/PipelineTranslatorUtils.java
+++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/translation/PipelineTranslatorUtils.java
@@ -17,8 +17,12 @@
*/
package org.apache.beam.runners.fnexecution.translation;
+import static org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkNotNull;
+
import java.io.IOException;
+import java.util.Collection;
import org.apache.beam.model.pipeline.v1.RunnerApi;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PCollection;
import org.apache.beam.runners.core.construction.RehydratedComponents;
import org.apache.beam.runners.core.construction.WindowingStrategyTranslation;
import org.apache.beam.runners.core.construction.graph.PipelineNode;
@@ -76,4 +80,13 @@
e);
}
}
+
+ /** Indicates whether the given pipeline has any unbounded PCollections. */
+ public static boolean hasUnboundedPCollections(RunnerApi.Pipeline pipeline) {
+ checkNotNull(pipeline);
+ Collection<PCollection> pCollecctions = pipeline.getComponents().getPcollectionsMap().values();
+ // Assume that all PCollections are consumed at some point in the pipeline.
+ return pCollecctions.stream()
+ .anyMatch(pc -> pc.getIsBounded() == RunnerApi.IsBounded.Enum.UNBOUNDED);
+ }
}
diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/RemoteExecutionTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/RemoteExecutionTest.java
index e32cd79..4839fef 100644
--- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/RemoteExecutionTest.java
+++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/RemoteExecutionTest.java
@@ -52,6 +52,8 @@
import org.apache.beam.runners.core.construction.graph.ExecutableStage;
import org.apache.beam.runners.core.construction.graph.FusedPipeline;
import org.apache.beam.runners.core.construction.graph.GreedyPipelineFuser;
+import org.apache.beam.runners.core.metrics.MonitoringInfoConstants;
+import org.apache.beam.runners.core.metrics.MonitoringInfoConstants.Urns;
import org.apache.beam.runners.core.metrics.MonitoringInfoMatchers;
import org.apache.beam.runners.core.metrics.SimpleMonitoringInfoBuilder;
import org.apache.beam.runners.fnexecution.GrpcContextHeaderAccessorProvider;
@@ -656,33 +658,33 @@
// The element counter should be counted only once for the pcollection.
// So there should be only two elements.
builder = new SimpleMonitoringInfoBuilder();
- builder.setUrn(SimpleMonitoringInfoBuilder.ELEMENT_COUNT_URN);
+ builder.setUrn(MonitoringInfoConstants.Urns.ELEMENT_COUNT);
builder.setPCollectionLabel("impulse.out");
builder.setInt64Value(2);
matchers.add(MonitoringInfoMatchers.matchSetFields(builder.build()));
builder = new SimpleMonitoringInfoBuilder();
- builder.setUrn(SimpleMonitoringInfoBuilder.ELEMENT_COUNT_URN);
+ builder.setUrn(MonitoringInfoConstants.Urns.ELEMENT_COUNT);
builder.setPCollectionLabel("create/ParMultiDo(Anonymous).output");
builder.setInt64Value(3);
matchers.add(MonitoringInfoMatchers.matchSetFields(builder.build()));
// Verify that the element count is not double counted if two PCollections consume it.
builder = new SimpleMonitoringInfoBuilder();
- builder.setUrn(SimpleMonitoringInfoBuilder.ELEMENT_COUNT_URN);
+ builder.setUrn(MonitoringInfoConstants.Urns.ELEMENT_COUNT);
builder.setPCollectionLabel("processA/ParMultiDo(Anonymous).output");
builder.setInt64Value(6);
matchers.add(MonitoringInfoMatchers.matchSetFields(builder.build()));
builder = new SimpleMonitoringInfoBuilder();
- builder.setUrn(SimpleMonitoringInfoBuilder.ELEMENT_COUNT_URN);
+ builder.setUrn(MonitoringInfoConstants.Urns.ELEMENT_COUNT);
builder.setPCollectionLabel("processB/ParMultiDo(Anonymous).output");
builder.setInt64Value(6);
matchers.add(MonitoringInfoMatchers.matchSetFields(builder.build()));
// Check for execution time metrics for the testPTransformId
builder = new SimpleMonitoringInfoBuilder();
- builder.setUrn(SimpleMonitoringInfoBuilder.START_BUNDLE_MSECS_URN);
+ builder.setUrn(MonitoringInfoConstants.Urns.START_BUNDLE_MSECS);
builder.setInt64TypeUrn();
builder.setPTransformLabel(testPTransformId);
matchers.add(
@@ -692,7 +694,7 @@
// Check for execution time metrics for the testPTransformId
builder = new SimpleMonitoringInfoBuilder();
- builder.setUrn(SimpleMonitoringInfoBuilder.PROCESS_BUNDLE_MSECS_URN);
+ builder.setUrn(Urns.PROCESS_BUNDLE_MSECS);
builder.setInt64TypeUrn();
builder.setPTransformLabel(testPTransformId);
matchers.add(
@@ -701,7 +703,7 @@
MonitoringInfoMatchers.valueGreaterThan(0)));
builder = new SimpleMonitoringInfoBuilder();
- builder.setUrn(SimpleMonitoringInfoBuilder.FINISH_BUNDLE_MSECS_URN);
+ builder.setUrn(Urns.FINISH_BUNDLE_MSECS);
builder.setInt64TypeUrn();
builder.setPTransformLabel(testPTransformId);
matchers.add(
diff --git a/runners/spark/build.gradle b/runners/spark/build.gradle
index 9dcdee8..f37a9d0 100644
--- a/runners/spark/build.gradle
+++ b/runners/spark/build.gradle
@@ -58,10 +58,12 @@
shadow project(path: ":beam-sdks-java-core", configuration: "shadow")
shadow project(path: ":beam-runners-core-construction-java", configuration: "shadow")
shadow project(path: ":beam-runners-core-java", configuration: "shadow")
+ shadow project(path: ":beam-runners-java-fn-execution", configuration: "shadow")
shadow library.java.guava
shadow library.java.jackson_annotations
shadow library.java.slf4j_api
shadow library.java.joda_time
+ shadow library.java.args4j
shadow "io.dropwizard.metrics:metrics-core:3.1.2"
shadow library.java.jackson_module_scala
provided library.java.spark_core
@@ -81,6 +83,7 @@
shadowTest project(path: ":beam-sdks-java-core", configuration: "shadowTest")
// SparkStateInternalsTest extends abstract StateInternalsTest
shadowTest project(path: ":beam-runners-core-java", configuration: "shadowTest")
+ shadowTest project(":beam-sdks-java-harness")
shadowTest library.java.avro
shadowTest library.java.kafka_clients
shadowTest library.java.junit
diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkJobInvoker.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkJobInvoker.java
new file mode 100644
index 0000000..cea4b07
--- /dev/null
+++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkJobInvoker.java
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.spark;
+
+import java.util.UUID;
+import javax.annotation.Nullable;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline;
+import org.apache.beam.runners.core.construction.PipelineOptionsTranslation;
+import org.apache.beam.runners.fnexecution.jobsubmission.JobInvocation;
+import org.apache.beam.runners.fnexecution.jobsubmission.JobInvoker;
+import org.apache.beam.runners.fnexecution.provisioning.JobInfo;
+import org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.Struct;
+import org.apache.beam.vendor.guava.v20_0.com.google.common.util.concurrent.ListeningExecutorService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** Creates a job invocation to manage the Spark runner's execution of a portable pipeline. */
+public class SparkJobInvoker extends JobInvoker {
+
+ private static final Logger LOG = LoggerFactory.getLogger(SparkJobInvoker.class);
+
+ public static SparkJobInvoker create() {
+ return new SparkJobInvoker();
+ }
+
+ private SparkJobInvoker() {
+ super("spark-runner-job-invoker");
+ }
+
+ @Override
+ protected JobInvocation invokeWithExecutor(
+ Pipeline pipeline,
+ Struct options,
+ @Nullable String retrievalToken,
+ ListeningExecutorService executorService) {
+ LOG.trace("Parsing pipeline options");
+ SparkPipelineOptions sparkOptions =
+ PipelineOptionsTranslation.fromProto(options).as(SparkPipelineOptions.class);
+
+ String invocationId =
+ String.format("%s_%s", sparkOptions.getJobName(), UUID.randomUUID().toString());
+ LOG.info("Invoking job {}", invocationId);
+
+ return createJobInvocation(
+ invocationId, retrievalToken, executorService, pipeline, sparkOptions);
+ }
+
+ static JobInvocation createJobInvocation(
+ String invocationId,
+ String retrievalToken,
+ ListeningExecutorService executorService,
+ Pipeline pipeline,
+ SparkPipelineOptions sparkOptions) {
+ JobInfo jobInfo =
+ JobInfo.create(
+ invocationId,
+ sparkOptions.getJobName(),
+ retrievalToken,
+ PipelineOptionsTranslation.toProto(sparkOptions));
+ SparkPipelineRunner pipelineRunner = new SparkPipelineRunner(sparkOptions);
+ return new JobInvocation(jobInfo, executorService, pipeline, pipelineRunner);
+ }
+}
diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkJobServerDriver.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkJobServerDriver.java
new file mode 100644
index 0000000..387907f
--- /dev/null
+++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkJobServerDriver.java
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.spark;
+
+import org.apache.beam.runners.fnexecution.ServerFactory;
+import org.apache.beam.runners.fnexecution.jobsubmission.JobInvoker;
+import org.apache.beam.runners.fnexecution.jobsubmission.JobServerDriver;
+import org.apache.beam.sdk.io.FileSystems;
+import org.apache.beam.sdk.options.PipelineOptionsFactory;
+import org.kohsuke.args4j.CmdLineException;
+import org.kohsuke.args4j.CmdLineParser;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** Driver program that starts a job server for the Spark runner. */
+public class SparkJobServerDriver extends JobServerDriver {
+
+ @Override
+ protected JobInvoker createJobInvoker() {
+ return SparkJobInvoker.create();
+ }
+
+ private static final Logger LOG = LoggerFactory.getLogger(SparkJobServerDriver.class);
+
+ public static void main(String[] args) {
+ FileSystems.setDefaultPipelineOptions(PipelineOptionsFactory.create());
+ fromParams(args).run();
+ }
+
+ private static void printUsage(CmdLineParser parser) {
+ System.err.println(
+ String.format("Usage: java %s arguments...", SparkJobServerDriver.class.getSimpleName()));
+ parser.printUsage(System.err);
+ System.err.println();
+ }
+
+ private static SparkJobServerDriver fromParams(String[] args) {
+ ServerConfiguration configuration = new ServerConfiguration();
+ CmdLineParser parser = new CmdLineParser(configuration);
+ try {
+ parser.parseArgument(args);
+ } catch (CmdLineException e) {
+ LOG.error("Unable to parse command line arguments.", e);
+ printUsage(parser);
+ throw new IllegalArgumentException("Unable to parse command line arguments.", e);
+ }
+
+ return fromConfig(configuration);
+ }
+
+ private static SparkJobServerDriver fromConfig(ServerConfiguration configuration) {
+ return create(
+ configuration,
+ createJobServerFactory(configuration),
+ createArtifactServerFactory(configuration));
+ }
+
+ private static SparkJobServerDriver create(
+ ServerConfiguration configuration,
+ ServerFactory jobServerFactory,
+ ServerFactory artifactServerFactory) {
+ return new SparkJobServerDriver(configuration, jobServerFactory, artifactServerFactory);
+ }
+
+ private SparkJobServerDriver(
+ ServerConfiguration configuration,
+ ServerFactory jobServerFactory,
+ ServerFactory artifactServerFactory) {
+ super(configuration, jobServerFactory, artifactServerFactory);
+ }
+}
diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java
new file mode 100644
index 0000000..b4d25ab
--- /dev/null
+++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.spark;
+
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline;
+import org.apache.beam.runners.core.construction.graph.GreedyPipelineFuser;
+import org.apache.beam.runners.core.construction.graph.PipelineTrimmer;
+import org.apache.beam.runners.fnexecution.jobsubmission.PortablePipelineRunner;
+import org.apache.beam.runners.fnexecution.provisioning.JobInfo;
+import org.apache.beam.runners.spark.aggregators.AggregatorsAccumulator;
+import org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator;
+import org.apache.beam.runners.spark.translation.SparkContextFactory;
+import org.apache.beam.runners.spark.translation.SparkTranslationContext;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** Runs a portable pipeline on Apache Spark. */
+public class SparkPipelineRunner implements PortablePipelineRunner {
+
+ private static final Logger LOG = LoggerFactory.getLogger(SparkPipelineRunner.class);
+
+ private final SparkPipelineOptions pipelineOptions;
+
+ public SparkPipelineRunner(SparkPipelineOptions pipelineOptions) {
+ this.pipelineOptions = pipelineOptions;
+ }
+
+ @Override
+ public SparkPipelineResult run(RunnerApi.Pipeline pipeline, JobInfo jobInfo) {
+ SparkBatchPortablePipelineTranslator translator = new SparkBatchPortablePipelineTranslator();
+
+ // Don't let the fuser fuse any subcomponents of native transforms.
+ Pipeline trimmedPipeline = PipelineTrimmer.trim(pipeline, translator.knownUrns());
+ Pipeline fusedPipeline = GreedyPipelineFuser.fuse(trimmedPipeline).toPipeline();
+
+ final JavaSparkContext jsc = SparkContextFactory.getSparkContext(pipelineOptions);
+ LOG.info(String.format("Running job %s on Spark master %s", jobInfo.jobId(), jsc.master()));
+ AggregatorsAccumulator.init(pipelineOptions, jsc);
+ final SparkTranslationContext context =
+ new SparkTranslationContext(jsc, pipelineOptions, jobInfo);
+ final ExecutorService executorService = Executors.newSingleThreadExecutor();
+ final Future<?> submissionFuture =
+ executorService.submit(
+ () -> {
+ translator.translate(fusedPipeline, context);
+ LOG.info(
+ String.format(
+ "Job %s: Pipeline translated successfully. Computing outputs",
+ jobInfo.jobId()));
+ context.computeOutputs();
+ LOG.info(String.format("Job %s finished.", jobInfo.jobId()));
+ });
+
+ SparkPipelineResult result = new SparkPipelineResult.BatchMode(submissionFuture, jsc);
+ result.waitUntilFinish();
+ executorService.shutdown();
+ return result;
+ }
+}
diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/BoundedDataset.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/BoundedDataset.java
index 9cd805a..1e620e7 100644
--- a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/BoundedDataset.java
+++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/BoundedDataset.java
@@ -117,6 +117,6 @@
@Override
public void setName(String name) {
- rdd.setName(name);
+ getRDD().setName(name);
}
}
diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkBatchPortablePipelineTranslator.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkBatchPortablePipelineTranslator.java
new file mode 100644
index 0000000..c65caa4
--- /dev/null
+++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkBatchPortablePipelineTranslator.java
@@ -0,0 +1,210 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.spark.translation;
+
+import static org.apache.beam.runners.fnexecution.translation.PipelineTranslatorUtils.createOutputMap;
+import static org.apache.beam.runners.fnexecution.translation.PipelineTranslatorUtils.getWindowingStrategy;
+
+import java.io.IOException;
+import java.util.Collections;
+import java.util.Map;
+import java.util.Set;
+import javax.annotation.Nullable;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PCollection;
+import org.apache.beam.runners.core.SystemReduceFn;
+import org.apache.beam.runners.core.construction.PTransformTranslation;
+import org.apache.beam.runners.core.construction.graph.ExecutableStage;
+import org.apache.beam.runners.core.construction.graph.PipelineNode;
+import org.apache.beam.runners.core.construction.graph.PipelineNode.PCollectionNode;
+import org.apache.beam.runners.core.construction.graph.PipelineNode.PTransformNode;
+import org.apache.beam.runners.core.construction.graph.QueryablePipeline;
+import org.apache.beam.runners.fnexecution.wire.WireCoders;
+import org.apache.beam.runners.spark.SparkPipelineOptions;
+import org.apache.beam.runners.spark.aggregators.AggregatorsAccumulator;
+import org.apache.beam.sdk.coders.ByteArrayCoder;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.coders.KvCoder;
+import org.apache.beam.sdk.transforms.join.RawUnionValue;
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
+import org.apache.beam.sdk.transforms.windowing.TimestampCombiner;
+import org.apache.beam.sdk.transforms.windowing.WindowFn;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.apache.beam.sdk.util.WindowedValue.WindowedValueCoder;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.WindowingStrategy;
+import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.BiMap;
+import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableMap;
+import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Iterables;
+import org.apache.spark.HashPartitioner;
+import org.apache.spark.Partitioner;
+import org.apache.spark.api.java.JavaRDD;
+
+/** Translates a bounded portable pipeline into a Spark job. */
+public class SparkBatchPortablePipelineTranslator {
+
+ private final ImmutableMap<String, PTransformTranslator> urnToTransformTranslator;
+
+ interface PTransformTranslator {
+
+ /** Translates transformNode from Beam into the Spark context. */
+ void translate(
+ PTransformNode transformNode, RunnerApi.Pipeline pipeline, SparkTranslationContext context);
+ }
+
+ public Set<String> knownUrns() {
+ return urnToTransformTranslator.keySet();
+ }
+
+ public SparkBatchPortablePipelineTranslator() {
+ ImmutableMap.Builder<String, PTransformTranslator> translatorMap = ImmutableMap.builder();
+ translatorMap.put(
+ PTransformTranslation.IMPULSE_TRANSFORM_URN,
+ SparkBatchPortablePipelineTranslator::translateImpulse);
+ translatorMap.put(
+ PTransformTranslation.GROUP_BY_KEY_TRANSFORM_URN,
+ SparkBatchPortablePipelineTranslator::translateGroupByKey);
+ translatorMap.put(
+ ExecutableStage.URN, SparkBatchPortablePipelineTranslator::translateExecutableStage);
+ this.urnToTransformTranslator = translatorMap.build();
+ }
+
+ /** Translates pipeline from Beam into the Spark context. */
+ public void translate(final RunnerApi.Pipeline pipeline, SparkTranslationContext context) {
+ QueryablePipeline p =
+ QueryablePipeline.forTransforms(
+ pipeline.getRootTransformIdsList(), pipeline.getComponents());
+ for (PipelineNode.PTransformNode transformNode : p.getTopologicallyOrderedTransforms()) {
+ urnToTransformTranslator
+ .getOrDefault(
+ transformNode.getTransform().getSpec().getUrn(),
+ SparkBatchPortablePipelineTranslator::urnNotFound)
+ .translate(transformNode, pipeline, context);
+ }
+ }
+
+ private static void urnNotFound(
+ PTransformNode transformNode, RunnerApi.Pipeline pipeline, SparkTranslationContext context) {
+ throw new IllegalArgumentException(
+ String.format(
+ "Transform %s has unknown URN %s",
+ transformNode.getId(), transformNode.getTransform().getSpec().getUrn()));
+ }
+
+ private static void translateImpulse(
+ PTransformNode transformNode, RunnerApi.Pipeline pipeline, SparkTranslationContext context) {
+ BoundedDataset<byte[]> output =
+ new BoundedDataset<>(
+ Collections.singletonList(new byte[0]), context.getSparkContext(), ByteArrayCoder.of());
+ context.pushDataset(getOutputId(transformNode), output);
+ }
+
+ private static <K, V> void translateGroupByKey(
+ PTransformNode transformNode, RunnerApi.Pipeline pipeline, SparkTranslationContext context) {
+
+ RunnerApi.Components components = pipeline.getComponents();
+ String inputId = getInputId(transformNode);
+ PCollection inputPCollection = components.getPcollectionsOrThrow(inputId);
+ Dataset inputDataset = context.popDataset(inputId);
+ JavaRDD<WindowedValue<KV<K, V>>> inputRdd = ((BoundedDataset<KV<K, V>>) inputDataset).getRDD();
+ PCollectionNode inputPCollectionNode = PipelineNode.pCollection(inputId, inputPCollection);
+ WindowedValueCoder<KV<K, V>> inputCoder;
+ try {
+ inputCoder =
+ (WindowedValueCoder)
+ WireCoders.instantiateRunnerWireCoder(inputPCollectionNode, components);
+ } catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ KvCoder<K, V> inputKvCoder = (KvCoder<K, V>) inputCoder.getValueCoder();
+ Coder<K> inputKeyCoder = inputKvCoder.getKeyCoder();
+ Coder<V> inputValueCoder = inputKvCoder.getValueCoder();
+ WindowingStrategy windowingStrategy = getWindowingStrategy(inputId, components);
+ WindowFn<Object, BoundedWindow> windowFn = windowingStrategy.getWindowFn();
+ WindowedValue.WindowedValueCoder<V> wvCoder =
+ WindowedValue.FullWindowedValueCoder.of(inputValueCoder, windowFn.windowCoder());
+
+ JavaRDD<WindowedValue<KV<K, Iterable<V>>>> groupedByKeyAndWindow;
+ if (windowingStrategy.getWindowFn().isNonMerging()
+ && windowingStrategy.getTimestampCombiner() == TimestampCombiner.END_OF_WINDOW) {
+ // we can have a memory sensitive translation for non-merging windows
+ groupedByKeyAndWindow =
+ GroupNonMergingWindowsFunctions.groupByKeyAndWindow(
+ inputRdd, inputKeyCoder, inputValueCoder, windowingStrategy);
+ } else {
+ Partitioner partitioner = getPartitioner(context);
+ JavaRDD<WindowedValue<KV<K, Iterable<WindowedValue<V>>>>> groupedByKeyOnly =
+ GroupCombineFunctions.groupByKeyOnly(inputRdd, inputKeyCoder, wvCoder, partitioner);
+ // for batch, GroupAlsoByWindow uses an in-memory StateInternals.
+ groupedByKeyAndWindow =
+ groupedByKeyOnly.flatMap(
+ new SparkGroupAlsoByWindowViaOutputBufferFn<>(
+ windowingStrategy,
+ new TranslationUtils.InMemoryStateInternalsFactory<>(),
+ SystemReduceFn.buffering(inputValueCoder),
+ context.serializablePipelineOptions,
+ AggregatorsAccumulator.getInstance()));
+ }
+ context.pushDataset(getOutputId(transformNode), new BoundedDataset<>(groupedByKeyAndWindow));
+ }
+
+ private static <InputT, OutputT> void translateExecutableStage(
+ PTransformNode transformNode, RunnerApi.Pipeline pipeline, SparkTranslationContext context) {
+
+ RunnerApi.ExecutableStagePayload stagePayload;
+ try {
+ stagePayload =
+ RunnerApi.ExecutableStagePayload.parseFrom(
+ transformNode.getTransform().getSpec().getPayload());
+ } catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ String inputPCollectionId = stagePayload.getInput();
+ Dataset inputDataset = context.popDataset(inputPCollectionId);
+ JavaRDD<WindowedValue<InputT>> inputRdd = ((BoundedDataset<InputT>) inputDataset).getRDD();
+ Map<String, String> outputs = transformNode.getTransform().getOutputsMap();
+ BiMap<String, Integer> outputMap = createOutputMap(outputs.values());
+
+ SparkExecutableStageFunction<InputT> function =
+ new SparkExecutableStageFunction<>(stagePayload, context.jobInfo, outputMap);
+ JavaRDD<RawUnionValue> staged = inputRdd.mapPartitions(function);
+
+ for (String outputId : outputs.values()) {
+ JavaRDD<WindowedValue<OutputT>> outputRdd =
+ staged.flatMap(new SparkExecutableStageExtractionFunction<>(outputMap.get(outputId)));
+ context.pushDataset(outputId, new BoundedDataset<>(outputRdd));
+ }
+ }
+
+ @Nullable
+ private static Partitioner getPartitioner(SparkTranslationContext context) {
+ Long bundleSize =
+ context.serializablePipelineOptions.get().as(SparkPipelineOptions.class).getBundleSize();
+ return (bundleSize > 0)
+ ? null
+ : new HashPartitioner(context.getSparkContext().defaultParallelism());
+ }
+
+ private static String getInputId(PTransformNode transformNode) {
+ return Iterables.getOnlyElement(transformNode.getTransform().getInputsMap().values());
+ }
+
+ private static String getOutputId(PTransformNode transformNode) {
+ return Iterables.getOnlyElement(transformNode.getTransform().getOutputsMap().values());
+ }
+}
diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkExecutableStageExtractionFunction.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkExecutableStageExtractionFunction.java
new file mode 100644
index 0000000..a5d1bbd
--- /dev/null
+++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkExecutableStageExtractionFunction.java
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.spark.translation;
+
+import java.util.Collections;
+import java.util.Iterator;
+import org.apache.beam.sdk.transforms.join.RawUnionValue;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.apache.spark.api.java.function.FlatMapFunction;
+
+class SparkExecutableStageExtractionFunction<OutputT>
+ implements FlatMapFunction<RawUnionValue, WindowedValue<OutputT>> {
+ private final int unionTag;
+
+ SparkExecutableStageExtractionFunction(int unionTag) {
+ this.unionTag = unionTag;
+ }
+
+ @Override
+ public Iterator<WindowedValue<OutputT>> call(RawUnionValue rawUnionValue) {
+ if (rawUnionValue.getUnionTag() == unionTag) {
+ WindowedValue<OutputT> output = (WindowedValue<OutputT>) rawUnionValue.getValue();
+ return Collections.singleton(output).iterator();
+ }
+ return Collections.emptyIterator();
+ }
+}
diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkExecutableStageFunction.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkExecutableStageFunction.java
new file mode 100644
index 0000000..93250bc
--- /dev/null
+++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkExecutableStageFunction.java
@@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.spark.translation;
+
+import java.io.Serializable;
+import java.util.EnumMap;
+import java.util.Iterator;
+import java.util.Locale;
+import java.util.Map;
+import java.util.concurrent.ConcurrentLinkedQueue;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.ProcessBundleProgressResponse;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.ProcessBundleResponse;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateKey;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateKey.TypeCase;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import org.apache.beam.runners.core.construction.graph.ExecutableStage;
+import org.apache.beam.runners.fnexecution.control.BundleProgressHandler;
+import org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory;
+import org.apache.beam.runners.fnexecution.control.JobBundleFactory;
+import org.apache.beam.runners.fnexecution.control.OutputReceiverFactory;
+import org.apache.beam.runners.fnexecution.control.RemoteBundle;
+import org.apache.beam.runners.fnexecution.control.StageBundleFactory;
+import org.apache.beam.runners.fnexecution.provisioning.JobInfo;
+import org.apache.beam.runners.fnexecution.state.StateRequestHandler;
+import org.apache.beam.runners.fnexecution.state.StateRequestHandlers;
+import org.apache.beam.sdk.fn.data.FnDataReceiver;
+import org.apache.beam.sdk.transforms.join.RawUnionValue;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.apache.spark.api.java.function.FlatMapFunction;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Spark function that passes its input through an SDK-executed {@link
+ * org.apache.beam.runners.core.construction.graph.ExecutableStage}.
+ *
+ * <p>The output of this operation is a multiplexed {@link Dataset} whose elements are tagged with a
+ * union coder. The coder's tags are determined by {@link SparkExecutableStageFunction#outputMap}.
+ * The resulting data set should be further processed by a {@link
+ * SparkExecutableStageExtractionFunction}.
+ */
+public class SparkExecutableStageFunction<InputT>
+ implements FlatMapFunction<Iterator<WindowedValue<InputT>>, RawUnionValue> {
+
+ private static final Logger LOG = LoggerFactory.getLogger(SparkExecutableStageFunction.class);
+
+ private final RunnerApi.ExecutableStagePayload stagePayload;
+ private final Map<String, Integer> outputMap;
+ private final JobBundleFactoryCreator jobBundleFactoryCreator;
+
+ SparkExecutableStageFunction(
+ RunnerApi.ExecutableStagePayload stagePayload,
+ JobInfo jobInfo,
+ Map<String, Integer> outputMap) {
+ this(stagePayload, outputMap, () -> DefaultJobBundleFactory.create(jobInfo));
+ }
+
+ SparkExecutableStageFunction(
+ RunnerApi.ExecutableStagePayload stagePayload,
+ Map<String, Integer> outputMap,
+ JobBundleFactoryCreator jobBundleFactoryCreator) {
+ this.stagePayload = stagePayload;
+ this.outputMap = outputMap;
+ this.jobBundleFactoryCreator = jobBundleFactoryCreator;
+ }
+
+ @Override
+ public Iterator<RawUnionValue> call(Iterator<WindowedValue<InputT>> inputs) throws Exception {
+ JobBundleFactory jobBundleFactory = jobBundleFactoryCreator.create();
+ ExecutableStage executableStage = ExecutableStage.fromPayload(stagePayload);
+ try (StageBundleFactory stageBundleFactory = jobBundleFactory.forStage(executableStage)) {
+ ConcurrentLinkedQueue<RawUnionValue> collector = new ConcurrentLinkedQueue<>();
+ ReceiverFactory receiverFactory = new ReceiverFactory(collector, outputMap);
+ EnumMap<TypeCase, StateRequestHandler> handlers = new EnumMap<>(StateKey.TypeCase.class);
+ // TODO add state request handlers
+ StateRequestHandler stateRequestHandler =
+ StateRequestHandlers.delegateBasedUponType(handlers);
+ SparkBundleProgressHandler bundleProgressHandler = new SparkBundleProgressHandler();
+ try (RemoteBundle bundle =
+ stageBundleFactory.getBundle(
+ receiverFactory, stateRequestHandler, bundleProgressHandler)) {
+ String inputPCollectionId = executableStage.getInputPCollection().getId();
+ FnDataReceiver<WindowedValue<?>> mainReceiver =
+ bundle.getInputReceivers().get(inputPCollectionId);
+ while (inputs.hasNext()) {
+ WindowedValue<InputT> input = inputs.next();
+ mainReceiver.accept(input);
+ }
+ }
+ return collector.iterator();
+ } catch (Exception e) {
+ LOG.error("Spark executable stage fn terminated with exception: ", e);
+ throw e;
+ }
+ }
+
+ interface JobBundleFactoryCreator extends Serializable {
+ JobBundleFactory create();
+ }
+
+ /**
+ * Receiver factory that wraps outgoing elements with the corresponding union tag for a
+ * multiplexed PCollection.
+ */
+ private static class ReceiverFactory implements OutputReceiverFactory {
+
+ private final ConcurrentLinkedQueue<RawUnionValue> collector;
+ private final Map<String, Integer> outputMap;
+
+ ReceiverFactory(
+ ConcurrentLinkedQueue<RawUnionValue> collector, Map<String, Integer> outputMap) {
+ this.collector = collector;
+ this.outputMap = outputMap;
+ }
+
+ @Override
+ public <OutputT> FnDataReceiver<OutputT> create(String pCollectionId) {
+ Integer unionTag = outputMap.get(pCollectionId);
+ if (unionTag == null) {
+ throw new IllegalStateException(
+ String.format(Locale.ENGLISH, "Unknown PCollectionId %s", pCollectionId));
+ }
+ int tagInt = unionTag;
+ return receivedElement -> collector.add(new RawUnionValue(tagInt, receivedElement));
+ }
+ }
+
+ private static class SparkBundleProgressHandler implements BundleProgressHandler {
+
+ @Override
+ public void onProgress(ProcessBundleProgressResponse progress) {
+ // TODO
+ }
+
+ @Override
+ public void onCompleted(ProcessBundleResponse response) {
+ // TODO
+ }
+ }
+}
diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkTranslationContext.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkTranslationContext.java
new file mode 100644
index 0000000..237e735
--- /dev/null
+++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkTranslationContext.java
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.spark.translation;
+
+import java.util.LinkedHashMap;
+import java.util.LinkedHashSet;
+import java.util.Map;
+import java.util.Set;
+import org.apache.beam.runners.core.construction.SerializablePipelineOptions;
+import org.apache.beam.runners.fnexecution.provisioning.JobInfo;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.spark.api.java.JavaSparkContext;
+
+/**
+ * Translation context used to lazily store Spark data sets during portable pipeline translation and
+ * compute them after translation.
+ */
+public class SparkTranslationContext {
+ private final JavaSparkContext jsc;
+ final JobInfo jobInfo;
+ private final Map<String, Dataset> datasets = new LinkedHashMap<>();
+ private final Set<Dataset> leaves = new LinkedHashSet<>();
+ final SerializablePipelineOptions serializablePipelineOptions;
+
+ public SparkTranslationContext(JavaSparkContext jsc, PipelineOptions options, JobInfo jobInfo) {
+ this.jsc = jsc;
+ this.serializablePipelineOptions = new SerializablePipelineOptions(options);
+ this.jobInfo = jobInfo;
+ }
+
+ public JavaSparkContext getSparkContext() {
+ return jsc;
+ }
+
+ /** Add output of transform to context. */
+ public void pushDataset(String pCollectionId, Dataset dataset) {
+ dataset.setName(pCollectionId);
+ // TODO cache
+ datasets.put(pCollectionId, dataset);
+ leaves.add(dataset);
+ }
+
+ /** Retrieve the dataset for the pCollection id and remove it from the DAG's leaves. */
+ public Dataset popDataset(String pCollectionId) {
+ Dataset dataset = datasets.get(pCollectionId);
+ leaves.remove(dataset);
+ return dataset;
+ }
+
+ /** Compute the outputs for all RDDs that are leaves in the DAG. */
+ public void computeOutputs() {
+ for (Dataset dataset : leaves) {
+ dataset.action(); // force computation.
+ }
+ }
+}
diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/SparkPortableExecutionTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/SparkPortableExecutionTest.java
new file mode 100644
index 0000000..ad97ec0
--- /dev/null
+++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/SparkPortableExecutionTest.java
@@ -0,0 +1,139 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.spark;
+
+import java.io.Serializable;
+import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+import org.apache.beam.model.jobmanagement.v1.JobApi.JobState.Enum;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import org.apache.beam.runners.core.construction.Environments;
+import org.apache.beam.runners.core.construction.PipelineTranslation;
+import org.apache.beam.runners.fnexecution.jobsubmission.JobInvocation;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.options.PipelineOptionsFactory;
+import org.apache.beam.sdk.options.PortablePipelineOptions;
+import org.apache.beam.sdk.testing.CrashingRunner;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.GroupByKey;
+import org.apache.beam.sdk.transforms.Impulse;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.WithKeys;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.vendor.guava.v20_0.com.google.common.util.concurrent.ListeningExecutorService;
+import org.apache.beam.vendor.guava.v20_0.com.google.common.util.concurrent.MoreExecutors;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Tests the execution of a pipeline from specification to execution on the portable Spark runner.
+ */
+public class SparkPortableExecutionTest implements Serializable {
+
+ private static final Logger LOG = LoggerFactory.getLogger(SparkPortableExecutionTest.class);
+
+ private static ListeningExecutorService sparkJobExecutor;
+
+ @BeforeClass
+ public static void setup() {
+ // Restrict this to only one thread to avoid multiple Spark clusters up at the same time
+ // which is not suitable for memory-constraint environments, i.e. Jenkins.
+ sparkJobExecutor = MoreExecutors.listeningDecorator(Executors.newFixedThreadPool(1));
+ }
+
+ @AfterClass
+ public static void tearDown() throws InterruptedException {
+ sparkJobExecutor.shutdown();
+ sparkJobExecutor.awaitTermination(10, TimeUnit.SECONDS);
+ if (!sparkJobExecutor.isShutdown()) {
+ LOG.warn("Could not shut down Spark job executor");
+ }
+ sparkJobExecutor = null;
+ }
+
+ @Test(timeout = 120_000)
+ public void testExecution() throws Exception {
+ PipelineOptions options = PipelineOptionsFactory.create();
+ options.setRunner(CrashingRunner.class);
+ options
+ .as(PortablePipelineOptions.class)
+ .setDefaultEnvironmentType(Environments.ENVIRONMENT_EMBEDDED);
+
+ Pipeline p = Pipeline.create(options);
+ PCollection<KV<String, Iterable<Long>>> result =
+ p.apply("impulse", Impulse.create())
+ .apply(
+ "create",
+ ParDo.of(
+ new DoFn<byte[], String>() {
+ @ProcessElement
+ public void process(ProcessContext context) {
+ context.output("zero");
+ context.output("one");
+ context.output("two");
+ }
+ }))
+ .apply(
+ "len",
+ ParDo.of(
+ new DoFn<String, Long>() {
+ @ProcessElement
+ public void process(ProcessContext context) {
+ context.output((long) context.element().length());
+ }
+ }))
+ .apply("addKeys", WithKeys.of("foo"))
+ // First GBK just to verify GBK works
+ .apply("gbk", GroupByKey.create())
+ .apply(
+ "print",
+ ParDo.of(
+ new DoFn<KV<String, Iterable<Long>>, KV<String, Long>>() {
+ @ProcessElement
+ public void process(ProcessContext context) {
+ LOG.info("Output element: {}", context.element());
+ for (Long i : context.element().getValue()) {
+ context.output(KV.of(context.element().getKey(), i));
+ }
+ }
+ }))
+ // Second GBK forces the output to be materialized
+ .apply("gbk", GroupByKey.create());
+
+ // TODO Get PAssert working to test pipeline result
+
+ RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(p);
+
+ JobInvocation jobInvocation =
+ SparkJobInvoker.createJobInvocation(
+ "fakeId",
+ "fakeRetrievalToken",
+ sparkJobExecutor,
+ pipelineProto,
+ options.as(SparkPipelineOptions.class));
+ jobInvocation.start();
+ while (jobInvocation.getState() != Enum.DONE) {
+ Thread.sleep(1000);
+ }
+ }
+}
diff --git a/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/SparkExecutableStageFunctionTest.java b/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/SparkExecutableStageFunctionTest.java
new file mode 100644
index 0000000..8f1bdca
--- /dev/null
+++ b/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/SparkExecutableStageFunctionTest.java
@@ -0,0 +1,203 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.spark.translation;
+
+import static org.apache.beam.runners.core.construction.PTransformTranslation.PAR_DO_TRANSFORM_URN;
+import static org.hamcrest.MatcherAssert.assertThat;
+import static org.hamcrest.Matchers.contains;
+import static org.mockito.Matchers.any;
+import static org.mockito.Mockito.doThrow;
+import static org.mockito.Mockito.verify;
+import static org.mockito.Mockito.verifyNoMoreInteractions;
+import static org.mockito.Mockito.when;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Iterator;
+import java.util.Map;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Components;
+import org.apache.beam.model.pipeline.v1.RunnerApi.ExecutableStagePayload;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PCollection;
+import org.apache.beam.runners.fnexecution.control.BundleProgressHandler;
+import org.apache.beam.runners.fnexecution.control.JobBundleFactory;
+import org.apache.beam.runners.fnexecution.control.OutputReceiverFactory;
+import org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors;
+import org.apache.beam.runners.fnexecution.control.RemoteBundle;
+import org.apache.beam.runners.fnexecution.control.StageBundleFactory;
+import org.apache.beam.runners.fnexecution.state.StateRequestHandler;
+import org.apache.beam.runners.spark.translation.SparkExecutableStageFunction.JobBundleFactoryCreator;
+import org.apache.beam.sdk.fn.data.FnDataReceiver;
+import org.apache.beam.sdk.transforms.join.RawUnionValue;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableMap;
+import org.junit.Before;
+import org.junit.Test;
+import org.mockito.Mock;
+import org.mockito.Mockito;
+import org.mockito.MockitoAnnotations;
+
+/** Unit tests for {@link SparkExecutableStageFunction}. */
+public class SparkExecutableStageFunctionTest {
+ @Mock private JobBundleFactoryCreator jobBundleFactoryCreator;
+ @Mock private JobBundleFactory jobBundleFactory;
+ @Mock private StageBundleFactory stageBundleFactory;
+ @Mock private RemoteBundle remoteBundle;
+
+ private final String inputId = "input-id";
+ private final ExecutableStagePayload stagePayload =
+ ExecutableStagePayload.newBuilder()
+ .setInput(inputId)
+ .setComponents(
+ Components.newBuilder()
+ .putTransforms(
+ "transform-id",
+ RunnerApi.PTransform.newBuilder()
+ .putInputs("input-name", inputId)
+ .setSpec(RunnerApi.FunctionSpec.newBuilder().setUrn(PAR_DO_TRANSFORM_URN))
+ .build())
+ .putPcollections(inputId, PCollection.getDefaultInstance())
+ .build())
+ .build();
+
+ @Before
+ public void setUpMocks() throws Exception {
+ MockitoAnnotations.initMocks(this);
+ when(jobBundleFactoryCreator.create()).thenReturn(jobBundleFactory);
+ when(jobBundleFactory.forStage(any())).thenReturn(stageBundleFactory);
+ when(stageBundleFactory.getBundle(any(), any(), any())).thenReturn(remoteBundle);
+ @SuppressWarnings("unchecked")
+ ImmutableMap<String, FnDataReceiver<WindowedValue<?>>> inputReceiver =
+ ImmutableMap.of("input", Mockito.mock(FnDataReceiver.class));
+ when(remoteBundle.getInputReceivers()).thenReturn(inputReceiver);
+ }
+
+ @Test(expected = Exception.class)
+ public void sdkErrorsSurfaceOnClose() throws Exception {
+ SparkExecutableStageFunction<Integer> function = getFunction(Collections.emptyMap());
+ doThrow(new Exception()).when(remoteBundle).close();
+ function.call(Collections.emptyIterator());
+ }
+
+ @Test
+ public void expectedInputsAreSent() throws Exception {
+ SparkExecutableStageFunction<Integer> function = getFunction(Collections.emptyMap());
+
+ RemoteBundle bundle = Mockito.mock(RemoteBundle.class);
+ when(stageBundleFactory.getBundle(any(), any(), any())).thenReturn(bundle);
+
+ @SuppressWarnings("unchecked")
+ FnDataReceiver<WindowedValue<?>> receiver = Mockito.mock(FnDataReceiver.class);
+ when(bundle.getInputReceivers()).thenReturn(ImmutableMap.of(inputId, receiver));
+
+ WindowedValue<Integer> one = WindowedValue.valueInGlobalWindow(1);
+ WindowedValue<Integer> two = WindowedValue.valueInGlobalWindow(2);
+ WindowedValue<Integer> three = WindowedValue.valueInGlobalWindow(3);
+ function.call(Arrays.asList(one, two, three).iterator());
+
+ verify(receiver).accept(one);
+ verify(receiver).accept(two);
+ verify(receiver).accept(three);
+ verifyNoMoreInteractions(receiver);
+ }
+
+ @Test
+ public void outputsAreTaggedCorrectly() throws Exception {
+ WindowedValue<Integer> three = WindowedValue.valueInGlobalWindow(3);
+ WindowedValue<Integer> four = WindowedValue.valueInGlobalWindow(4);
+ WindowedValue<Integer> five = WindowedValue.valueInGlobalWindow(5);
+ Map<String, Integer> outputTagMap =
+ ImmutableMap.of(
+ "one", 1,
+ "two", 2,
+ "three", 3);
+
+ // We use a real StageBundleFactory here in order to exercise the output receiver factory.
+ StageBundleFactory stageBundleFactory =
+ new StageBundleFactory() {
+
+ private boolean once;
+
+ @Override
+ public RemoteBundle getBundle(
+ OutputReceiverFactory receiverFactory,
+ StateRequestHandler stateRequestHandler,
+ BundleProgressHandler progressHandler) {
+ return new RemoteBundle() {
+ @Override
+ public String getId() {
+ return "bundle-id";
+ }
+
+ @Override
+ public Map<String, FnDataReceiver<WindowedValue<?>>> getInputReceivers() {
+ return ImmutableMap.of(
+ "input",
+ input -> {
+ /* Ignore input*/
+ });
+ }
+
+ @Override
+ public void close() throws Exception {
+ if (once) {
+ return;
+ }
+ // Emit all values to the runner when the bundle is closed.
+ receiverFactory.create("one").accept(three);
+ receiverFactory.create("two").accept(four);
+ receiverFactory.create("three").accept(five);
+ once = true;
+ }
+ };
+ }
+
+ @Override
+ public ProcessBundleDescriptors.ExecutableProcessBundleDescriptor
+ getProcessBundleDescriptor() {
+ return Mockito.mock(ProcessBundleDescriptors.ExecutableProcessBundleDescriptor.class);
+ }
+
+ @Override
+ public void close() {}
+ };
+ when(jobBundleFactory.forStage(any())).thenReturn(stageBundleFactory);
+
+ SparkExecutableStageFunction<Integer> function = getFunction(outputTagMap);
+ Iterator<RawUnionValue> iterator = function.call(Collections.emptyIterator());
+ Iterable<RawUnionValue> iterable = () -> iterator;
+
+ assertThat(
+ iterable,
+ contains(
+ new RawUnionValue(1, three), new RawUnionValue(2, four), new RawUnionValue(3, five)));
+ }
+
+ @Test
+ public void testStageBundleClosed() throws Exception {
+ SparkExecutableStageFunction<Integer> function = getFunction(Collections.emptyMap());
+ function.call(Collections.emptyIterator());
+ verify(stageBundleFactory).getBundle(any(), any(), any());
+ verify(stageBundleFactory).close();
+ verifyNoMoreInteractions(stageBundleFactory);
+ }
+
+ private <T> SparkExecutableStageFunction<T> getFunction(Map<String, Integer> outputMap) {
+ return new SparkExecutableStageFunction<>(stagePayload, outputMap, jobBundleFactoryCreator);
+ }
+}
diff --git a/sdks/go/cmd/starcgen/starcgen.go b/sdks/go/cmd/starcgen/starcgen.go
index 35ac88f..e3b627b 100644
--- a/sdks/go/cmd/starcgen/starcgen.go
+++ b/sdks/go/cmd/starcgen/starcgen.go
@@ -29,6 +29,7 @@
//
// //go:generate go install github.com/apache/beam/sdks/go/cmd/starcgen
// //go:generate starcgen --package=<mypackagename>
+// //go:generate go fmt
//
// This will generate registrations and shim types for all types and functions
// in the package, in a file `<mypackagename>.shims.go`.
@@ -38,6 +39,7 @@
//
// //go:generate go install github.com/apache/beam/sdks/go/cmd/starcgen
// //go:generate starcgen --package=<mypackagename> --inputs=foo.go --identifiers=myFn,myStructFn --output=custom.shims.go
+// //go:generate go fmt
//
package main
diff --git a/sdks/go/pkg/beam/core/graph/coder/coder.go b/sdks/go/pkg/beam/core/graph/coder/coder.go
index 89b5bd9..cfcca35 100644
--- a/sdks/go/pkg/beam/core/graph/coder/coder.go
+++ b/sdks/go/pkg/beam/core/graph/coder/coder.go
@@ -244,7 +244,7 @@
return &Coder{Kind: Bytes, T: typex.New(reflectx.ByteSlice)}
}
-// NewVarInt returns a new int32 coder using the built-in scheme.
+// NewVarInt returns a new int64 coder using the built-in scheme.
func NewVarInt() *Coder {
return &Coder{Kind: VarInt, T: typex.New(reflectx.Int64)}
}
diff --git a/sdks/go/pkg/beam/core/metrics/metrics.go b/sdks/go/pkg/beam/core/metrics/metrics.go
index d00f1f0..a29446b 100644
--- a/sdks/go/pkg/beam/core/metrics/metrics.go
+++ b/sdks/go/pkg/beam/core/metrics/metrics.go
@@ -82,15 +82,19 @@
switch key {
case bundleKey:
if ctx.bundleID == "" {
- if id := ctx.Value(key); id != nil {
+ if id := ctx.Context.Value(key); id != nil {
ctx.bundleID = id.(string)
+ } else {
+ return nil
}
}
return ctx.bundleID
case ptransformKey:
if ctx.ptransformID == "" {
- if id := ctx.Value(key); id != nil {
+ if id := ctx.Context.Value(key); id != nil {
ctx.ptransformID = id.(string)
+ } else {
+ return nil
}
}
return ctx.ptransformID
@@ -118,8 +122,13 @@
return &beamCtx{Context: ctx, ptransformID: id}
}
+const (
+ bundleIDUnset = "(bundle id unset)"
+ ptransformIDUnset = "(ptransform id unset)"
+)
+
func getContextKey(ctx context.Context, n name) key {
- key := key{name: n, bundle: "(bundle id unset)", ptransform: "(ptransform id unset)"}
+ key := key{name: n, bundle: bundleIDUnset, ptransform: ptransformIDUnset}
if id := ctx.Value(bundleKey); id != nil {
key.bundle = id.(string)
}
diff --git a/sdks/go/pkg/beam/core/metrics/metrics_test.go b/sdks/go/pkg/beam/core/metrics/metrics_test.go
index 6e28981..3ef2de9 100644
--- a/sdks/go/pkg/beam/core/metrics/metrics_test.go
+++ b/sdks/go/pkg/beam/core/metrics/metrics_test.go
@@ -25,6 +25,43 @@
// bID is a bundleId to use in the tests, if nothing more specific is needed.
const bID = "bID"
+// TestRobustness validates metrics not panicking if the context doesn't
+// have the bundle or transform ID.
+func TestRobustness(t *testing.T) {
+ m := NewCounter("Test", "myCount")
+ m.Inc(context.Background(), 3)
+ ptCtx := SetPTransformID(context.Background(), "MY_TRANSFORM")
+ m.Inc(ptCtx, 3)
+ bCtx := SetBundleID(context.Background(), bID)
+ m.Inc(bCtx, 3)
+}
+
+func TestBeamContext(t *testing.T) {
+ t.Run("ptransformID", func(t *testing.T) {
+ ptID := "MY_TRANSFORM"
+ ctx := SetPTransformID(context.Background(), ptID)
+ key := getContextKey(ctx, name{})
+ if key.bundle != bundleIDUnset {
+ t.Errorf("bundleId = %q, want %q", key.bundle, bundleIDUnset)
+ }
+ if key.ptransform != ptID {
+ t.Errorf("ptransformId = %q, want %q", key.ptransform, ptID)
+ }
+
+ })
+
+ t.Run("bundleID", func(t *testing.T) {
+ ctx := SetBundleID(context.Background(), bID)
+ key := getContextKey(ctx, name{})
+ if key.bundle != bID {
+ t.Errorf("bundleId = %q, want %q", key.bundle, bID)
+ }
+ if key.ptransform != ptransformIDUnset {
+ t.Errorf("ptransformId = %q, want %q", key.ptransform, ptransformIDUnset)
+ }
+ })
+}
+
func ctxWith(b, pt string) context.Context {
ctx := context.Background()
ctx = SetPTransformID(ctx, pt)
diff --git a/sdks/go/pkg/beam/core/runtime/coderx/coderx.shims.go b/sdks/go/pkg/beam/core/runtime/coderx/coderx.shims.go
index be6e362..91cc99f 100644
--- a/sdks/go/pkg/beam/core/runtime/coderx/coderx.shims.go
+++ b/sdks/go/pkg/beam/core/runtime/coderx/coderx.shims.go
@@ -45,25 +45,25 @@
runtime.RegisterFunction(encVarIntZ)
runtime.RegisterFunction(encVarUintZ)
runtime.RegisterType(reflect.TypeOf((*reflect.Type)(nil)).Elem())
- reflectx.RegisterFunc(reflect.TypeOf((*func(int32) []byte)(nil)).Elem(), funcMakerInt32ГSliceOfByte)
- reflectx.RegisterFunc(reflect.TypeOf((*func(int64) []byte)(nil)).Elem(), funcMakerInt64ГSliceOfByte)
- reflectx.RegisterFunc(reflect.TypeOf((*func(reflect.Type, []byte) (typex.T, error))(nil)).Elem(), funcMakerReflect۰TypeSliceOfByteГTypex۰TError)
- reflectx.RegisterFunc(reflect.TypeOf((*func([]byte) int32)(nil)).Elem(), funcMakerSliceOfByteГInt32)
- reflectx.RegisterFunc(reflect.TypeOf((*func([]byte) int64)(nil)).Elem(), funcMakerSliceOfByteГInt64)
- reflectx.RegisterFunc(reflect.TypeOf((*func([]byte) typex.T)(nil)).Elem(), funcMakerSliceOfByteГTypex۰T)
- reflectx.RegisterFunc(reflect.TypeOf((*func([]byte) uint32)(nil)).Elem(), funcMakerSliceOfByteГUint32)
- reflectx.RegisterFunc(reflect.TypeOf((*func([]byte) uint64)(nil)).Elem(), funcMakerSliceOfByteГUint64)
- reflectx.RegisterFunc(reflect.TypeOf((*func(typex.T) []byte)(nil)).Elem(), funcMakerTypex۰TГSliceOfByte)
- reflectx.RegisterFunc(reflect.TypeOf((*func(uint32) []byte)(nil)).Elem(), funcMakerUint32ГSliceOfByte)
- reflectx.RegisterFunc(reflect.TypeOf((*func(uint64) []byte)(nil)).Elem(), funcMakerUint64ГSliceOfByte)
+ reflectx.RegisterFunc(reflect.TypeOf((*func(int32) ([]byte))(nil)).Elem(), funcMakerInt32ГSliceOfByte)
+ reflectx.RegisterFunc(reflect.TypeOf((*func(int64) ([]byte))(nil)).Elem(), funcMakerInt64ГSliceOfByte)
+ reflectx.RegisterFunc(reflect.TypeOf((*func(reflect.Type,[]byte) (typex.T,error))(nil)).Elem(), funcMakerReflect۰TypeSliceOfByteГTypex۰TError)
+ reflectx.RegisterFunc(reflect.TypeOf((*func([]byte) (int32))(nil)).Elem(), funcMakerSliceOfByteГInt32)
+ reflectx.RegisterFunc(reflect.TypeOf((*func([]byte) (int64))(nil)).Elem(), funcMakerSliceOfByteГInt64)
+ reflectx.RegisterFunc(reflect.TypeOf((*func([]byte) (typex.T))(nil)).Elem(), funcMakerSliceOfByteГTypex۰T)
+ reflectx.RegisterFunc(reflect.TypeOf((*func([]byte) (uint32))(nil)).Elem(), funcMakerSliceOfByteГUint32)
+ reflectx.RegisterFunc(reflect.TypeOf((*func([]byte) (uint64))(nil)).Elem(), funcMakerSliceOfByteГUint64)
+ reflectx.RegisterFunc(reflect.TypeOf((*func(typex.T) ([]byte))(nil)).Elem(), funcMakerTypex۰TГSliceOfByte)
+ reflectx.RegisterFunc(reflect.TypeOf((*func(uint32) ([]byte))(nil)).Elem(), funcMakerUint32ГSliceOfByte)
+ reflectx.RegisterFunc(reflect.TypeOf((*func(uint64) ([]byte))(nil)).Elem(), funcMakerUint64ГSliceOfByte)
}
type callerInt32ГSliceOfByte struct {
- fn func(int32) []byte
+ fn func(int32) ([]byte)
}
func funcMakerInt32ГSliceOfByte(fn interface{}) reflectx.Func {
- f := fn.(func(int32) []byte)
+ f := fn.(func(int32) ([]byte))
return &callerInt32ГSliceOfByte{fn: f}
}
@@ -80,16 +80,16 @@
return []interface{}{out0}
}
-func (c *callerInt32ГSliceOfByte) Call1x1(arg0 interface{}) interface{} {
+func (c *callerInt32ГSliceOfByte) Call1x1(arg0 interface{}) (interface{}) {
return c.fn(arg0.(int32))
}
type callerInt64ГSliceOfByte struct {
- fn func(int64) []byte
+ fn func(int64) ([]byte)
}
func funcMakerInt64ГSliceOfByte(fn interface{}) reflectx.Func {
- f := fn.(func(int64) []byte)
+ f := fn.(func(int64) ([]byte))
return &callerInt64ГSliceOfByte{fn: f}
}
@@ -106,16 +106,16 @@
return []interface{}{out0}
}
-func (c *callerInt64ГSliceOfByte) Call1x1(arg0 interface{}) interface{} {
+func (c *callerInt64ГSliceOfByte) Call1x1(arg0 interface{}) (interface{}) {
return c.fn(arg0.(int64))
}
type callerReflect۰TypeSliceOfByteГTypex۰TError struct {
- fn func(reflect.Type, []byte) (typex.T, error)
+ fn func(reflect.Type,[]byte) (typex.T,error)
}
func funcMakerReflect۰TypeSliceOfByteГTypex۰TError(fn interface{}) reflectx.Func {
- f := fn.(func(reflect.Type, []byte) (typex.T, error))
+ f := fn.(func(reflect.Type,[]byte) (typex.T,error))
return &callerReflect۰TypeSliceOfByteГTypex۰TError{fn: f}
}
@@ -137,11 +137,11 @@
}
type callerSliceOfByteГInt32 struct {
- fn func([]byte) int32
+ fn func([]byte) (int32)
}
func funcMakerSliceOfByteГInt32(fn interface{}) reflectx.Func {
- f := fn.(func([]byte) int32)
+ f := fn.(func([]byte) (int32))
return &callerSliceOfByteГInt32{fn: f}
}
@@ -158,16 +158,16 @@
return []interface{}{out0}
}
-func (c *callerSliceOfByteГInt32) Call1x1(arg0 interface{}) interface{} {
+func (c *callerSliceOfByteГInt32) Call1x1(arg0 interface{}) (interface{}) {
return c.fn(arg0.([]byte))
}
type callerSliceOfByteГInt64 struct {
- fn func([]byte) int64
+ fn func([]byte) (int64)
}
func funcMakerSliceOfByteГInt64(fn interface{}) reflectx.Func {
- f := fn.(func([]byte) int64)
+ f := fn.(func([]byte) (int64))
return &callerSliceOfByteГInt64{fn: f}
}
@@ -184,16 +184,16 @@
return []interface{}{out0}
}
-func (c *callerSliceOfByteГInt64) Call1x1(arg0 interface{}) interface{} {
+func (c *callerSliceOfByteГInt64) Call1x1(arg0 interface{}) (interface{}) {
return c.fn(arg0.([]byte))
}
type callerSliceOfByteГTypex۰T struct {
- fn func([]byte) typex.T
+ fn func([]byte) (typex.T)
}
func funcMakerSliceOfByteГTypex۰T(fn interface{}) reflectx.Func {
- f := fn.(func([]byte) typex.T)
+ f := fn.(func([]byte) (typex.T))
return &callerSliceOfByteГTypex۰T{fn: f}
}
@@ -210,16 +210,16 @@
return []interface{}{out0}
}
-func (c *callerSliceOfByteГTypex۰T) Call1x1(arg0 interface{}) interface{} {
+func (c *callerSliceOfByteГTypex۰T) Call1x1(arg0 interface{}) (interface{}) {
return c.fn(arg0.([]byte))
}
type callerSliceOfByteГUint32 struct {
- fn func([]byte) uint32
+ fn func([]byte) (uint32)
}
func funcMakerSliceOfByteГUint32(fn interface{}) reflectx.Func {
- f := fn.(func([]byte) uint32)
+ f := fn.(func([]byte) (uint32))
return &callerSliceOfByteГUint32{fn: f}
}
@@ -236,16 +236,16 @@
return []interface{}{out0}
}
-func (c *callerSliceOfByteГUint32) Call1x1(arg0 interface{}) interface{} {
+func (c *callerSliceOfByteГUint32) Call1x1(arg0 interface{}) (interface{}) {
return c.fn(arg0.([]byte))
}
type callerSliceOfByteГUint64 struct {
- fn func([]byte) uint64
+ fn func([]byte) (uint64)
}
func funcMakerSliceOfByteГUint64(fn interface{}) reflectx.Func {
- f := fn.(func([]byte) uint64)
+ f := fn.(func([]byte) (uint64))
return &callerSliceOfByteГUint64{fn: f}
}
@@ -262,16 +262,16 @@
return []interface{}{out0}
}
-func (c *callerSliceOfByteГUint64) Call1x1(arg0 interface{}) interface{} {
+func (c *callerSliceOfByteГUint64) Call1x1(arg0 interface{}) (interface{}) {
return c.fn(arg0.([]byte))
}
type callerTypex۰TГSliceOfByte struct {
- fn func(typex.T) []byte
+ fn func(typex.T) ([]byte)
}
func funcMakerTypex۰TГSliceOfByte(fn interface{}) reflectx.Func {
- f := fn.(func(typex.T) []byte)
+ f := fn.(func(typex.T) ([]byte))
return &callerTypex۰TГSliceOfByte{fn: f}
}
@@ -288,16 +288,16 @@
return []interface{}{out0}
}
-func (c *callerTypex۰TГSliceOfByte) Call1x1(arg0 interface{}) interface{} {
+func (c *callerTypex۰TГSliceOfByte) Call1x1(arg0 interface{}) (interface{}) {
return c.fn(arg0.(typex.T))
}
type callerUint32ГSliceOfByte struct {
- fn func(uint32) []byte
+ fn func(uint32) ([]byte)
}
func funcMakerUint32ГSliceOfByte(fn interface{}) reflectx.Func {
- f := fn.(func(uint32) []byte)
+ f := fn.(func(uint32) ([]byte))
return &callerUint32ГSliceOfByte{fn: f}
}
@@ -314,16 +314,16 @@
return []interface{}{out0}
}
-func (c *callerUint32ГSliceOfByte) Call1x1(arg0 interface{}) interface{} {
+func (c *callerUint32ГSliceOfByte) Call1x1(arg0 interface{}) (interface{}) {
return c.fn(arg0.(uint32))
}
type callerUint64ГSliceOfByte struct {
- fn func(uint64) []byte
+ fn func(uint64) ([]byte)
}
func funcMakerUint64ГSliceOfByte(fn interface{}) reflectx.Func {
- f := fn.(func(uint64) []byte)
+ f := fn.(func(uint64) ([]byte))
return &callerUint64ГSliceOfByte{fn: f}
}
@@ -340,8 +340,9 @@
return []interface{}{out0}
}
-func (c *callerUint64ГSliceOfByte) Call1x1(arg0 interface{}) interface{} {
+func (c *callerUint64ГSliceOfByte) Call1x1(arg0 interface{}) (interface{}) {
return c.fn(arg0.(uint64))
}
+
// DO NOT MODIFY: GENERATED CODE
diff --git a/sdks/go/pkg/beam/core/runtime/coderx/doc.go b/sdks/go/pkg/beam/core/runtime/coderx/doc.go
index e734054..9a8ef29 100644
--- a/sdks/go/pkg/beam/core/runtime/coderx/doc.go
+++ b/sdks/go/pkg/beam/core/runtime/coderx/doc.go
@@ -19,3 +19,4 @@
//go:generate go install github.com/apache/beam/sdks/go/cmd/starcgen
//go:generate starcgen --package=coderx --identifiers=encString,decString,encUint32,decUint32,encInt32,decInt32,encUint64,decUint64,encInt64,decInt64,encVarIntZ,decVarIntZ,encVarUintZ,decVarUintZ,encFloat,decFloat
+//go:generate go fmt
\ No newline at end of file
diff --git a/sdks/go/pkg/beam/core/runtime/exec/combine.go b/sdks/go/pkg/beam/core/runtime/exec/combine.go
index ce7b33d..ef8c3cd 100644
--- a/sdks/go/pkg/beam/core/runtime/exec/combine.go
+++ b/sdks/go/pkg/beam/core/runtime/exec/combine.go
@@ -24,6 +24,7 @@
"github.com/apache/beam/sdks/go/pkg/beam/core/graph"
"github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder"
+ "github.com/apache/beam/sdks/go/pkg/beam/core/metrics"
"github.com/apache/beam/sdks/go/pkg/beam/core/typex"
"github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx"
"github.com/apache/beam/sdks/go/pkg/beam/util/errorx"
@@ -36,6 +37,9 @@
UsesKey bool
Out Node
+ PID string
+ ctx context.Context
+
binaryMergeFn reflectx.Func2x1 // optimized caller in the case of binary merge accumulators
status Status
@@ -47,6 +51,11 @@
aiValConvert func(interface{}) interface{}
}
+// GetPID returns the PTransformID for this CombineFn.
+func (n *Combine) GetPID() string {
+ return n.PID
+}
+
// ID returns the UnitID for this node.
func (n *Combine) ID() UnitID {
return n.UID
@@ -106,7 +115,12 @@
}
n.status = Active
- if err := n.Out.StartBundle(ctx, id, data); err != nil {
+ // Allocating contexts all the time is expensive, but we seldom re-write them,
+ // and never accept modified contexts from users, so we will cache them per-bundle
+ // per-unit, to avoid the constant allocation overhead.
+ n.ctx = metrics.SetPTransformID(ctx, n.PID)
+
+ if err := n.Out.StartBundle(n.ctx, id, data); err != nil {
return n.fail(err)
}
return nil
@@ -122,7 +136,7 @@
// Note that we do not explicitly call merge, although it may
// be called implicitly when adding input.
- a, err := n.newAccum(ctx, value.Elm)
+ a, err := n.newAccum(n.ctx, value.Elm)
if err != nil {
return n.fail(err)
}
@@ -142,18 +156,18 @@
return n.fail(err)
}
- a, err = n.addInput(ctx, a, value.Elm, v.Elm, value.Timestamp, first)
+ a, err = n.addInput(n.ctx, a, value.Elm, v.Elm, value.Timestamp, first)
if err != nil {
return n.fail(err)
}
first = false
}
- out, err := n.extract(ctx, a)
+ out, err := n.extract(n.ctx, a)
if err != nil {
return n.fail(err)
}
- return n.Out.ProcessElement(ctx, &FullValue{Windows: value.Windows, Elm: value.Elm, Elm2: out, Timestamp: value.Timestamp})
+ return n.Out.ProcessElement(n.ctx, &FullValue{Windows: value.Windows, Elm: value.Elm, Elm2: out, Timestamp: value.Timestamp})
}
// FinishBundle completes this node's processing of a bundle.
@@ -175,7 +189,7 @@
n.extractOutputInv.Reset()
}
- if err := n.Out.FinishBundle(ctx); err != nil {
+ if err := n.Out.FinishBundle(n.ctx); err != nil {
return n.fail(err)
}
return nil
@@ -336,14 +350,14 @@
if notfirst {
a = afv.Elm2
} else {
- b, err := n.newAccum(ctx, value.Elm)
+ b, err := n.newAccum(n.Combine.ctx, value.Elm)
if err != nil {
return n.fail(err)
}
a = b
}
- a, err = n.addInput(ctx, a, value.Elm, value.Elm2, value.Timestamp, !notfirst)
+ a, err = n.addInput(n.Combine.ctx, a, value.Elm, value.Elm2, value.Timestamp, !notfirst)
if err != nil {
return n.fail(err)
}
@@ -363,7 +377,7 @@
if k == key {
continue
}
- if err := n.Out.ProcessElement(ctx, &a); err != nil {
+ if err := n.Out.ProcessElement(n.Combine.ctx, &a); err != nil {
return err
}
delete(n.cache, k)
@@ -388,7 +402,7 @@
// Need to run n.Out.ProcessElement for all the cached precombined KVs, and
// then finally Finish bundle as normal.
for _, a := range n.cache {
- if err := n.Out.ProcessElement(ctx, &a); err != nil {
+ if err := n.Out.ProcessElement(n.Combine.ctx, &a); err != nil {
return err
}
}
@@ -396,7 +410,7 @@
// Down isn't guaranteed to be called.
n.cache = nil
- return n.Combine.FinishBundle(ctx)
+ return n.Combine.FinishBundle(n.Combine.ctx)
}
// Down tears down the cache.
@@ -423,7 +437,7 @@
if n.status != Active {
return fmt.Errorf("invalid status for combine merge %v: %v", n.UID, n.status)
}
- a, err := n.newAccum(ctx, value.Elm)
+ a, err := n.newAccum(n.Combine.ctx, value.Elm)
if err != nil {
return n.fail(err)
}
@@ -447,12 +461,12 @@
first = false
continue
}
- a, err = n.mergeAccumulators(ctx, a, v.Elm)
+ a, err = n.mergeAccumulators(n.Combine.ctx, a, v.Elm)
if err != nil {
return err
}
}
- return n.Out.ProcessElement(ctx, &FullValue{Windows: value.Windows, Elm: value.Elm, Elm2: a, Timestamp: value.Timestamp})
+ return n.Out.ProcessElement(n.Combine.ctx, &FullValue{Windows: value.Windows, Elm: value.Elm, Elm2: a, Timestamp: value.Timestamp})
}
// Up eagerly gets the optimized binary merge function.
@@ -478,9 +492,9 @@
if n.status != Active {
return fmt.Errorf("invalid status for combine extract %v: %v", n.UID, n.status)
}
- out, err := n.extract(ctx, value.Elm2)
+ out, err := n.extract(n.Combine.ctx, value.Elm2)
if err != nil {
return n.fail(err)
}
- return n.Out.ProcessElement(ctx, &FullValue{Windows: value.Windows, Elm: value.Elm, Elm2: out, Timestamp: value.Timestamp})
+ return n.Out.ProcessElement(n.Combine.ctx, &FullValue{Windows: value.Windows, Elm: value.Elm, Elm2: out, Timestamp: value.Timestamp})
}
diff --git a/sdks/go/pkg/beam/core/runtime/exec/pardo.go b/sdks/go/pkg/beam/core/runtime/exec/pardo.go
index 2a96c86..49f74c5 100644
--- a/sdks/go/pkg/beam/core/runtime/exec/pardo.go
+++ b/sdks/go/pkg/beam/core/runtime/exec/pardo.go
@@ -49,6 +49,11 @@
err errorx.GuardedError
}
+// GetPID returns the PTransformID for this ParDo.
+func (n *ParDo) GetPID() string {
+ return n.PID
+}
+
// cacheElm holds per-window cached information about side input.
type cacheElm struct {
key typex.Window
@@ -56,10 +61,12 @@
extra []interface{}
}
+// ID returns the UnitID for this ParDo.
func (n *ParDo) ID() UnitID {
return n.UID
}
+// Up initializes this ParDo and does one-time DoFn setup.
func (n *ParDo) Up(ctx context.Context) error {
if n.status != Initializing {
return fmt.Errorf("invalid status for pardo %v: %v, want Initializing", n.UID, n.status)
@@ -79,6 +86,7 @@
return nil
}
+// StartBundle does pre-bundle processing operation for the DoFn.
func (n *ParDo) StartBundle(ctx context.Context, id string, data DataContext) error {
if n.status != Up {
return fmt.Errorf("invalid status for pardo %v: %v, want Up", n.UID, n.status)
@@ -102,6 +110,7 @@
return nil
}
+// ProcessElement processes each parallel element with the DoFn.
func (n *ParDo) ProcessElement(ctx context.Context, elm *FullValue, values ...ReStream) error {
if n.status != Active {
return fmt.Errorf("invalid status for pardo %v: %v, want Active", n.UID, n.status)
@@ -148,6 +157,9 @@
return explode || usesSideInput
}
+// FinishBundle does post-bundle processing operations for the DoFn.
+// Note: This is not a "FinalizeBundle" operation. Data is not yet durably
+// persisted at this point.
func (n *ParDo) FinishBundle(ctx context.Context) error {
if n.status != Active {
return fmt.Errorf("invalid status for pardo %v: %v, want Active", n.UID, n.status)
@@ -167,6 +179,7 @@
return nil
}
+// Down performs best-effort teardown of DoFn resources. (May not run.)
func (n *ParDo) Down(ctx context.Context) error {
if n.status == Down {
return n.err.Error()
diff --git a/sdks/go/pkg/beam/core/runtime/exec/plan.go b/sdks/go/pkg/beam/core/runtime/exec/plan.go
index bae0c3c..2436f08 100644
--- a/sdks/go/pkg/beam/core/runtime/exec/plan.go
+++ b/sdks/go/pkg/beam/core/runtime/exec/plan.go
@@ -33,7 +33,7 @@
id string
roots []Root
units []Unit
- parDoIds []string
+ parDoIDs []string
status Status
@@ -41,6 +41,12 @@
source *DataSource
}
+// hasPID provides a common interface for extracting PTransformIDs
+// from Units.
+type hasPID interface {
+ GetPID() string
+}
+
// NewPlan returns a new bundle execution plan from the given units.
func NewPlan(id string, units []Unit) (*Plan, error) {
var roots []Root
@@ -57,8 +63,8 @@
if s, ok := u.(*DataSource); ok {
source = s
}
- if p, ok := u.(*ParDo); ok {
- pardoIDs = append(pardoIDs, p.PID)
+ if p, ok := u.(hasPID); ok {
+ pardoIDs = append(pardoIDs, p.GetPID())
}
}
if len(roots) == 0 {
@@ -70,7 +76,7 @@
status: Initializing,
roots: roots,
units: units,
- parDoIds: pardoIDs,
+ parDoIDs: pardoIDs,
source: source,
}, nil
}
@@ -175,7 +181,7 @@
}
}
- for _, pt := range p.parDoIds {
+ for _, pt := range p.parDoIDs {
transforms[pt] = &fnpb.Metrics_PTransform{
User: metrics.ToProto(p.id, pt),
}
diff --git a/sdks/go/pkg/beam/core/runtime/exec/translate.go b/sdks/go/pkg/beam/core/runtime/exec/translate.go
index 031ac82..df69a9a 100644
--- a/sdks/go/pkg/beam/core/runtime/exec/translate.go
+++ b/sdks/go/pkg/beam/core/runtime/exec/translate.go
@@ -25,7 +25,7 @@
"github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder"
"github.com/apache/beam/sdks/go/pkg/beam/core/graph/window"
"github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx"
- "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx/v1"
+ v1 "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx/v1"
"github.com/apache/beam/sdks/go/pkg/beam/core/typex"
"github.com/apache/beam/sdks/go/pkg/beam/core/util/protox"
"github.com/apache/beam/sdks/go/pkg/beam/core/util/stringx"
@@ -375,12 +375,13 @@
switch op {
case graph.ParDo:
- n := &ParDo{UID: b.idgen.New(), PID: id.to, Inbound: in, Out: out}
+ n := &ParDo{UID: b.idgen.New(), Inbound: in, Out: out}
n.Fn, err = graph.AsDoFn(fn)
if err != nil {
return nil, err
}
- // TODO(lostluck): 2018/03/22 Look into why transform.UniqueName isn't populated at this point, and switch n.PID to that instead.
+ // transform.UniqueName may be per-bundle, which isn't useful for metrics.
+ // Use the short name for the DoFn instead.
n.PID = path.Base(n.Fn.Name())
input := unmarshalKeyedValues(transform.GetInputs())
@@ -413,6 +414,10 @@
}
cn.UsesKey = typex.IsKV(in[0].Type)
+ // transform.UniqueName may be per-bundle, which isn't useful for metrics.
+ // Use the short name for the DoFn instead.
+ cn.PID = path.Base(cn.Fn.Name())
+
switch urn {
case urnPerKeyCombinePre:
inputs := unmarshalKeyedValues(transform.GetInputs())
diff --git a/sdks/go/pkg/beam/doc_test.go b/sdks/go/pkg/beam/doc_test.go
index 645926f..92a2b03 100644
--- a/sdks/go/pkg/beam/doc_test.go
+++ b/sdks/go/pkg/beam/doc_test.go
@@ -128,9 +128,9 @@
a := textio.Read(s, "...some file path...") // PCollection<string>
beam.Seq(s, a,
- strconv.Atoi, // string to int
+ strconv.Atoi, // string to int
func(i int) float64 { return float64(i) }, // int to float64
- math.Signbit, // float64 to bool
+ math.Signbit, // float64 to bool
) // PCollection<bool>
}
diff --git a/sdks/go/pkg/beam/runners/direct/direct.go b/sdks/go/pkg/beam/runners/direct/direct.go
index f456391..608fe98 100644
--- a/sdks/go/pkg/beam/runners/direct/direct.go
+++ b/sdks/go/pkg/beam/runners/direct/direct.go
@@ -220,8 +220,13 @@
var u exec.Node
switch edge.Op {
case graph.ParDo:
- pardo := &exec.ParDo{UID: b.idgen.New(), Fn: edge.DoFn, Inbound: edge.Input, Out: out}
- pardo.PID = path.Base(pardo.Fn.Name())
+ pardo := &exec.ParDo{
+ UID: b.idgen.New(),
+ Fn: edge.DoFn,
+ Inbound: edge.Input,
+ Out: out,
+ PID: path.Base(edge.DoFn.Name()),
+ }
if len(edge.Input) == 1 {
u = pardo
break
@@ -249,7 +254,13 @@
case graph.Combine:
usesKey := typex.IsKV(edge.Input[0].Type)
- u = &exec.Combine{UID: b.idgen.New(), Fn: edge.CombineFn, UsesKey: usesKey, Out: out[0]}
+ u = &exec.Combine{
+ UID: b.idgen.New(),
+ Fn: edge.CombineFn,
+ UsesKey: usesKey,
+ Out: out[0],
+ PID: path.Base(edge.CombineFn.Name()),
+ }
case graph.CoGBK:
u = &CoGBK{UID: b.idgen.New(), Edge: edge, Out: out[0]}
diff --git a/sdks/go/pkg/beam/testing/passert/passert.go b/sdks/go/pkg/beam/testing/passert/passert.go
index c267960..21c9c4c 100644
--- a/sdks/go/pkg/beam/testing/passert/passert.go
+++ b/sdks/go/pkg/beam/testing/passert/passert.go
@@ -29,6 +29,7 @@
//go:generate go install github.com/apache/beam/sdks/go/cmd/starcgen
//go:generate starcgen --package=passert --identifiers=diffFn,failFn,failKVFn,failGBKFn,hashFn,sumFn
+//go:generate go fmt
// Equals verifies the given collection has the same values as the given
// values, under coder equality. The values can be provided as single
diff --git a/sdks/go/pkg/beam/transforms/filter/filter.go b/sdks/go/pkg/beam/transforms/filter/filter.go
index 6bf86c0..e9cebab 100644
--- a/sdks/go/pkg/beam/transforms/filter/filter.go
+++ b/sdks/go/pkg/beam/transforms/filter/filter.go
@@ -25,6 +25,7 @@
//go:generate go install github.com/apache/beam/sdks/go/cmd/starcgen
//go:generate starcgen --package=filter --identifiers=filterFn,mapFn,keyFn
+//go:generate go fmt
var (
sig = funcx.MakePredicate(beam.TType) // T -> bool
diff --git a/sdks/go/pkg/beam/transforms/top/top.go b/sdks/go/pkg/beam/transforms/top/top.go
index cc72a22..71122cc 100644
--- a/sdks/go/pkg/beam/transforms/top/top.go
+++ b/sdks/go/pkg/beam/transforms/top/top.go
@@ -32,6 +32,7 @@
//go:generate go install github.com/apache/beam/sdks/go/cmd/starcgen
//go:generate starcgen --package=top --identifiers=combineFn
+//go:generate go fmt
var (
sig = funcx.MakePredicate(beam.TType, beam.TType) // (T, T) -> bool
diff --git a/sdks/go/pkg/beam/util.go b/sdks/go/pkg/beam/util.go
index 9fad037..4209b31 100644
--- a/sdks/go/pkg/beam/util.go
+++ b/sdks/go/pkg/beam/util.go
@@ -17,6 +17,7 @@
//go:generate go install github.com/apache/beam/sdks/go/cmd/starcgen
//go:generate starcgen --package=beam --identifiers=addFixedKeyFn,dropKeyFn,dropValueFn,swapKVFn,explodeFn,jsonDec,jsonEnc,protoEnc,protoDec,makePartitionFn,createFn
+//go:generate go fmt
// We have some freedom to create various utilities, users can use depending on
// preferences. One point of keeping Pipeline transformation functions plain Go
diff --git a/sdks/go/pkg/beam/x/debug/doc.go b/sdks/go/pkg/beam/x/debug/doc.go
index 67498bb..20345bf 100644
--- a/sdks/go/pkg/beam/x/debug/doc.go
+++ b/sdks/go/pkg/beam/x/debug/doc.go
@@ -19,3 +19,4 @@
//go:generate go install github.com/apache/beam/sdks/go/cmd/starcgen
//go:generate starcgen --package=debug --identifiers=headFn,headKVFn,discardFn,printFn,printKVFn,printGBKFn
+//go:generate go fmt
diff --git a/sdks/java/build-tools/src/main/resources/beam/spotbugs-filter.xml b/sdks/java/build-tools/src/main/resources/beam/spotbugs-filter.xml
index 545c2f8..c0de9c8 100644
--- a/sdks/java/build-tools/src/main/resources/beam/spotbugs-filter.xml
+++ b/sdks/java/build-tools/src/main/resources/beam/spotbugs-filter.xml
@@ -382,7 +382,7 @@
</Match>
<Match>
- <Class name="org.apache.beam.sdk.util.GcsUtil$StorageObjectOrIOException"/>
+ <Class name="org.apache.beam.sdk.extensions.gcp.util.GcsUtil$StorageObjectOrIOException"/>
<Bug pattern="NM_CLASS_NOT_EXCEPTION"/>
<!-- It is clear from the name that this class holds either StorageObject or IOException. -->
</Match>
diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoder.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoder.java
index d8f4ef9..447e086 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoder.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoder.java
@@ -69,6 +69,11 @@
.build();
private final Schema schema;
+
+ public UUID getId() {
+ return id;
+ }
+
private final UUID id;
@Nullable private transient Coder<Row> delegateCoder = null;
diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoderGenerator.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoderGenerator.java
index 07bb37d..994d695 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoderGenerator.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoderGenerator.java
@@ -97,6 +97,7 @@
private static final ForLoadedType LIST_CODER_TYPE = new ForLoadedType(ListCoder.class);
private static final ForLoadedType MAP_CODER_TYPE = new ForLoadedType(MapCoder.class);
private static final BitSetCoder NULL_LIST_CODER = BitSetCoder.of();
+ private static final VarIntCoder VAR_INT_CODER = VarIntCoder.of();
private static final ForLoadedType NULLABLE_CODER = new ForLoadedType(NullableCoder.class);
private static final String CODERS_FIELD_NAME = "FIELD_CODERS";
@@ -219,6 +220,9 @@
static void encodeDelegate(
Coder[] coders, Row value, OutputStream outputStream, boolean hasNullableFields)
throws IOException {
+ // Encode the field count. This allows us to handle compatible schema changes.
+ VAR_INT_CODER.encode(value.getFieldCount(), outputStream);
+ // Encode a bitmap for the null fields to save having to encode a bunch of nulls.
NULL_LIST_CODER.encode(scanNullFields(value, hasNullableFields), outputStream);
for (int idx = 0; idx < value.getFieldCount(); ++idx) {
Object fieldValue = value.getValue(idx);
@@ -289,15 +293,24 @@
// per-field Coders.
static Row decodeDelegate(Schema schema, Coder[] coders, InputStream inputStream)
throws IOException {
+ int fieldCount = VAR_INT_CODER.decode(inputStream);
BitSet nullFields = NULL_LIST_CODER.decode(inputStream);
List<Object> fieldValues = Lists.newArrayListWithCapacity(coders.length);
- for (int i = 0; i < coders.length; ++i) {
- if (nullFields.get(i)) {
- fieldValues.add(null);
- } else {
- fieldValues.add(coders[i].decode(inputStream));
+ for (int i = 0; i < fieldCount; ++i) {
+ // In the case of a schema change going backwards, fieldCount might be > coders.length,
+ // in which case we drop the extra fields.
+ if (i < coders.length) {
+ if (nullFields.get(i)) {
+ fieldValues.add(null);
+ } else {
+ fieldValues.add(coders[i].decode(inputStream));
+ }
}
}
+ // If the schema was evolved to contain more fields, we fill them in with nulls.
+ for (int i = fieldCount; i < coders.length; i++) {
+ fieldValues.add(null);
+ }
// We call attachValues instead of setValues. setValues validates every element in the list
// is of the proper type, potentially converts to the internal type Row stores, and copies
// all values. Since we assume that decode is always being called on a previously-encoded
diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/OffsetRange.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/OffsetRange.java
index ae3a5b6..e398a9d 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/OffsetRange.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/OffsetRange.java
@@ -19,10 +19,17 @@
import static org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.List;
+import org.apache.beam.sdk.coders.AtomicCoder;
+import org.apache.beam.sdk.coders.CoderException;
import org.apache.beam.sdk.transforms.splittabledofn.HasDefaultTracker;
+import org.apache.beam.sdk.util.VarInt;
+import org.apache.beam.sdk.values.TypeDescriptor;
/** A restriction represented by a range of integers [from, to). */
public class OffsetRange
@@ -98,4 +105,47 @@
}
return res;
}
+
+ /** A coder for {@link OffsetRange}s. */
+ public static class Coder extends AtomicCoder<OffsetRange> {
+ private static final Coder INSTANCE = new Coder();
+ private static final TypeDescriptor<OffsetRange> TYPE_DESCRIPTOR =
+ new TypeDescriptor<OffsetRange>() {};
+
+ public static Coder of() {
+ return INSTANCE;
+ }
+
+ @Override
+ public void encode(OffsetRange value, OutputStream outStream)
+ throws CoderException, IOException {
+ VarInt.encode(value.from, outStream);
+ VarInt.encode(value.to, outStream);
+ }
+
+ @Override
+ public OffsetRange decode(InputStream inStream) throws CoderException, IOException {
+ return new OffsetRange(VarInt.decodeLong(inStream), VarInt.decodeLong(inStream));
+ }
+
+ @Override
+ public boolean isRegisterByteSizeObserverCheap(OffsetRange value) {
+ return true;
+ }
+
+ @Override
+ protected long getEncodedElementByteSize(OffsetRange value) throws Exception {
+ return (long) VarInt.getLength(value.from) + VarInt.getLength(value.to);
+ }
+
+ @Override
+ public boolean consistentWithEquals() {
+ return true;
+ }
+
+ @Override
+ public TypeDescriptor<OffsetRange> getEncodedTypeDescriptor() {
+ return TYPE_DESCRIPTOR;
+ }
+ }
}
diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java
index e77b92d..2d7ff17 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java
@@ -949,17 +949,12 @@
Class<? extends PipelineOptions> klass)
throws IntrospectionException {
- // TODO(BEAM-308): Make this an error in users pipelines for the next major version
- // of Apache Beam.
- if (!Modifier.isPublic(iface.getModifiers())) {
- LOG.warn(
- "Using non-public interface {} may fail during runtime. The JVM requires that "
- + "all non-public interfaces to be in the same package; otherwise, it would not be "
- + "possible for the PipelineOptions proxy class to implement all of the interfaces, "
- + "regardless of what package it is defined in. This will become an error in"
- + "a future version of Apache Beam.",
- iface.getName());
- }
+ checkArgument(
+ Modifier.isPublic(iface.getModifiers()),
+ "Please mark non-public interface %s as public. The JVM requires that "
+ + "all non-public interfaces to be in the same package which will prevent the "
+ + "PipelineOptions proxy class to implement all of the interfaces.",
+ iface.getName());
// Verify that there are no methods with the same name with two different return types.
validateReturnType(iface);
diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java
index 30e1f9c..21fd782 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java
@@ -41,6 +41,7 @@
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableMap;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableSet;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Lists;
+import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Maps;
/** {@link Schema} describes the fields in {@link Row}. */
@Experimental(Kind.SCHEMAS)
@@ -77,6 +78,8 @@
}
// A mapping between field names an indices.
private final BiMap<String, Integer> fieldIndices = HashBiMap.create();
+ private Map<String, Integer> encodingPositions = Maps.newHashMap();
+
private final List<Field> fields;
// Cache the hashCode, so it doesn't have to be recomputed. Schema objects are immutable, so this
// is correct.
@@ -210,6 +213,7 @@
throw new IllegalArgumentException(
"Duplicate field " + field.getName() + " added to schema");
}
+ encodingPositions.put(field.getName(), index);
fieldIndices.put(field.getName(), index++);
}
this.hashCode = Objects.hash(fieldIndices, fields);
@@ -224,6 +228,16 @@
this.uuid = uuid;
}
+ /** Gets the encoding positions for this schema. */
+ public Map<String, Integer> getEncodingPositions() {
+ return encodingPositions;
+ }
+
+ /** Sets the encoding positions for this schema. */
+ public void setEncodingPositions(Map<String, Integer> encodingPositions) {
+ this.encodingPositions = encodingPositions;
+ }
+
/** Get this schema's UUID. */
@Nullable
public UUID getUUID() {
diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Combine.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Combine.java
index 2c54466..54e6f08 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Combine.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Combine.java
@@ -341,16 +341,18 @@
/**
* Adds the given input value to the given accumulator, returning the new accumulator value.
*
- * <p>For efficiency, the input accumulator may be modified and returned.
+ * @param mutableAccumulator may be modified and returned for efficiency
+ * @param input should not be mutated
*/
- public abstract AccumT addInput(AccumT accumulator, InputT input);
+ public abstract AccumT addInput(AccumT mutableAccumulator, InputT input);
/**
* Returns an accumulator representing the accumulation of all the input values accumulated in
* the merging accumulators.
*
- * <p>May modify any of the argument accumulators. May return a fresh accumulator, or may return
- * one of the (modified) argument accumulators.
+ * @param accumulators only the first accumulator may be modified and returned for efficiency;
+ * the other accumulators should not be mutated, because they may be shared with other code
+ * and mutating them could lead to incorrect results or data corruption.
*/
public abstract AccumT mergeAccumulators(Iterable<AccumT> accumulators);
diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java
index e3c0604..069215a 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java
@@ -232,6 +232,7 @@
* Returns the input element to be processed.
*
* <p>The element will not be changed -- it is safe to cache, etc. without copying.
+ * Implementation of {@link DoFn.ProcessElement} method should not mutate the element.
*/
public abstract InputT element();
@@ -539,7 +540,7 @@
* <p>The signature of this method must satisfy the following constraints:
*
* <ul>
- * <li>If one of its arguments is a subtype of {@link RestrictionTracker}, then it is a <a
+ * <li>If one of its arguments is a {@link RestrictionTracker}, then it is a <a
* href="https://s.apache.org/splittable-do-fn">splittable</a> {@link DoFn} subject to the
* separate requirements described below. Items below are assuming this is not a splittable
* {@link DoFn}.
@@ -572,8 +573,8 @@
* <h2>Splittable DoFn's</h2>
*
* <p>A {@link DoFn} is <i>splittable</i> if its {@link ProcessElement} method has a parameter
- * whose type is a subtype of {@link RestrictionTracker}. This is an advanced feature and an
- * overwhelming majority of users will never need to write a splittable {@link DoFn}.
+ * whose type is of {@link RestrictionTracker}. This is an advanced feature and an overwhelming
+ * majority of users will never need to write a splittable {@link DoFn}.
*
* <p>Not all runners support Splittable DoFn. See the <a
* href="https://beam.apache.org/documentation/runners/capability-matrix/">capability matrix</a>.
@@ -586,12 +587,10 @@
* <ul>
* <li>It <i>must</i> define a {@link GetInitialRestriction} method.
* <li>It <i>may</i> define a {@link SplitRestriction} method.
- * <li>It <i>may</i> define a {@link NewTracker} method returning the same type as the type of
- * the {@link RestrictionTracker} argument of {@link ProcessElement}, which in turn must be
- * a subtype of {@code RestrictionTracker<R>} where {@code R} is the restriction type
- * returned by {@link GetInitialRestriction}. This method is optional in case the
- * restriction type returned by {@link GetInitialRestriction} implements {@link
- * HasDefaultTracker}.
+ * <li>It <i>may</i> define a {@link NewTracker} method returning a subtype of {@code
+ * RestrictionTracker<R>} where {@code R} is the restriction type returned by {@link
+ * GetInitialRestriction}. This method is optional in case the restriction type returned by
+ * {@link GetInitialRestriction} implements {@link HasDefaultTracker}.
* <li>It <i>may</i> define a {@link GetRestrictionCoder} method.
* <li>The type of restrictions used by all of these methods must be the same.
* <li>Its {@link ProcessElement} method <i>may</i> return a {@link ProcessContinuation} to
diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java
index 3dc24d9..1db0a48 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java
@@ -30,26 +30,35 @@
import java.io.OutputStream;
import java.io.Serializable;
import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashSet;
import java.util.List;
import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
import javax.annotation.Nullable;
import org.apache.beam.sdk.annotations.Experimental;
import org.apache.beam.sdk.coders.AtomicCoder;
-import org.apache.beam.sdk.coders.BooleanCoder;
import org.apache.beam.sdk.coders.CannotProvideCoderException;
import org.apache.beam.sdk.coders.Coder;
import org.apache.beam.sdk.coders.DurationCoder;
import org.apache.beam.sdk.coders.InstantCoder;
import org.apache.beam.sdk.coders.KvCoder;
+import org.apache.beam.sdk.coders.ListCoder;
import org.apache.beam.sdk.coders.MapCoder;
import org.apache.beam.sdk.coders.NullableCoder;
import org.apache.beam.sdk.coders.SnappyCoder;
import org.apache.beam.sdk.coders.StructuredCoder;
import org.apache.beam.sdk.coders.VarIntCoder;
+import org.apache.beam.sdk.io.range.OffsetRange;
import org.apache.beam.sdk.transforms.Contextful.Fn;
+import org.apache.beam.sdk.transforms.DoFn.BoundedPerElement;
import org.apache.beam.sdk.transforms.DoFn.UnboundedPerElement;
+import org.apache.beam.sdk.transforms.Watch.Growth.PollResult;
+import org.apache.beam.sdk.transforms.splittabledofn.OffsetRangeTracker;
import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker;
import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
+import org.apache.beam.sdk.util.VarInt;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.values.TimestampedValue;
@@ -57,6 +66,7 @@
import org.apache.beam.sdk.values.TypeDescriptors;
import org.apache.beam.sdk.values.TypeDescriptors.TypeVariableExtractor;
import org.apache.beam.vendor.guava.v20_0.com.google.common.annotations.VisibleForTesting;
+import org.apache.beam.vendor.guava.v20_0.com.google.common.base.MoreObjects;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableMap;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Lists;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Maps;
@@ -178,14 +188,29 @@
}
/**
- * Sets the watermark - an approximate lower bound on timestamps of future new outputs from
- * this {@link PollFn}.
+ * Returns a new {@link PollResult} like this one with the provided watermark. The watermark
+ * represents an approximate lower bound on timestamps of future new outputs from the {@link
+ * PollFn}.
*/
public PollResult<OutputT> withWatermark(Instant watermark) {
checkNotNull(watermark, "watermark");
return new PollResult<>(outputs, watermark);
}
+ /** Returns a new {@link PollResult} like this one with the provided outputs. */
+ public PollResult<OutputT> withOutputs(List<TimestampedValue<OutputT>> outputs) {
+ checkNotNull(outputs);
+ return new PollResult<>(outputs, watermark);
+ }
+
+ @Override
+ public String toString() {
+ return MoreObjects.toStringHelper(this)
+ .add("watermark", watermark)
+ .add("outputs", outputs)
+ .toString();
+ }
+
/**
* Constructs a {@link PollResult} with the given outputs and declares that there will be no
* new outputs for the current input. The {@link PollFn} will not be called again for this
@@ -228,6 +253,23 @@
}
return res;
}
+
+ @Override
+ public boolean equals(Object o) {
+ if (this == o) {
+ return true;
+ }
+ if (o == null || getClass() != o.getClass()) {
+ return false;
+ }
+ PollResult<?> that = (PollResult<?>) o;
+ return Objects.equals(outputs, that.outputs) && Objects.equals(watermark, that.watermark);
+ }
+
+ @Override
+ public int hashCode() {
+ return Objects.hash(outputs, watermark);
+ }
}
/**
@@ -684,21 +726,61 @@
}
}
- return input
- .apply(
- ParDo.of(new WatchGrowthFn<>(this, outputCoder, outputKeyFn, outputKeyCoder))
- .withSideInputs(getPollFn().getRequirements().getSideInputs()))
+ PCollection<KV<InputT, List<TimestampedValue<OutputT>>>> polledPc =
+ input
+ .apply(
+ ParDo.of(new WatchGrowthFn<>(this, outputCoder, outputKeyFn, outputKeyCoder))
+ .withSideInputs(getPollFn().getRequirements().getSideInputs()))
+ .setCoder(
+ KvCoder.of(
+ input.getCoder(),
+ ListCoder.of(TimestampedValue.TimestampedValueCoder.of(outputCoder))));
+ return polledPc
+ .apply(ParDo.of(new PollResultSplitFn<>()))
.setCoder(KvCoder.of(input.getCoder(), outputCoder));
}
}
+ /** A splittable {@link DoFn} that emits {@link PollResult}s outputs. */
+ @BoundedPerElement
+ private static class PollResultSplitFn<InputT, OutputT>
+ extends DoFn<KV<InputT, List<TimestampedValue<OutputT>>>, KV<InputT, OutputT>> {
+
+ @ProcessElement
+ public void processElement(ProcessContext c, RestrictionTracker<OffsetRange, Long> tracker) {
+ long position = tracker.currentRestriction().getFrom();
+ while (tracker.tryClaim(position)) {
+ TimestampedValue<OutputT> value = c.element().getValue().get((int) position);
+ c.outputWithTimestamp(KV.of(c.element().getKey(), value.getValue()), value.getTimestamp());
+ c.updateWatermark(value.getTimestamp());
+ position += 1L;
+ }
+ }
+
+ @GetInitialRestriction
+ public OffsetRange getInitialRestriction(KV<InputT, List<TimestampedValue<OutputT>>> element) {
+ return new OffsetRange(0, element.getValue().size());
+ }
+
+ @NewTracker
+ public OffsetRangeTracker newTracker(OffsetRange restriction) {
+ return restriction.newTracker();
+ }
+
+ @GetRestrictionCoder
+ public Coder<OffsetRange> getRestrictionCoder() {
+ return OffsetRange.Coder.of();
+ }
+ }
+
@UnboundedPerElement
private static class WatchGrowthFn<InputT, OutputT, KeyT, TerminationStateT>
- extends DoFn<InputT, KV<InputT, OutputT>> {
+ extends DoFn<InputT, KV<InputT, List<TimestampedValue<OutputT>>>> {
private final Watch.Growth<InputT, OutputT, KeyT> spec;
private final Coder<OutputT> outputCoder;
private final SerializableFunction<OutputT, KeyT> outputKeyFn;
private final Coder<KeyT> outputKeyCoder;
+ private final Funnel<OutputT> coderFunnel;
private WatchGrowthFn(
Growth<InputT, OutputT, KeyT> spec,
@@ -709,74 +791,131 @@
this.outputCoder = outputCoder;
this.outputKeyFn = outputKeyFn;
this.outputKeyCoder = outputKeyCoder;
+ this.coderFunnel =
+ (from, into) -> {
+ try {
+ // Rather than hashing the output itself, hash the output key.
+ KeyT outputKey = outputKeyFn.apply(from);
+ outputKeyCoder.encode(outputKey, Funnels.asOutputStream(into));
+ } catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ };
}
@ProcessElement
public ProcessContinuation process(
- ProcessContext c, final GrowthTracker<OutputT, KeyT, TerminationStateT> tracker)
+ ProcessContext c,
+ RestrictionTracker<GrowthState, KV<Growth.PollResult<OutputT>, TerminationStateT>> tracker)
throws Exception {
- if (!tracker.hasPending() && !tracker.currentRestriction().isOutputComplete) {
- Instant now = Instant.now();
- Growth.PollResult<OutputT> res =
- spec.getPollFn().getClosure().apply(c.element(), wrapProcessContext(c));
- // TODO (https://issues.apache.org/jira/browse/BEAM-2680):
- // Consider truncating the pending outputs if there are too many, to avoid blowing
- // up the state. In that case, we'd rely on the next poll cycle to provide more outputs.
- // All outputs would still have to be stored in state.completed, but it is more compact
- // because it stores hashes and because it could potentially be garbage-collected.
- int numPending = tracker.addNewAsPending(res);
- if (numPending > 0) {
- LOG.info(
- "{} - current round of polling took {} ms and returned {} results, "
- + "of which {} were new. The output is {}.",
- c.element(),
- new Duration(now, Instant.now()).getMillis(),
- res.getOutputs().size(),
- numPending,
- BoundedWindow.TIMESTAMP_MAX_VALUE.equals(res.getWatermark())
- ? "final"
- : "not yet final");
+
+ GrowthState currentRestriction = tracker.currentRestriction();
+ if (currentRestriction instanceof NonPollingGrowthState) {
+ Growth.PollResult<OutputT> priorPoll =
+ ((NonPollingGrowthState<OutputT>) currentRestriction).getPending();
+ if (tracker.tryClaim(KV.of(priorPoll, null))) {
+ if (!priorPoll.getOutputs().isEmpty()) {
+ LOG.info(
+ "{} - re-emitting output of prior poll containing {} results.",
+ c.element(),
+ priorPoll.getOutputs().size());
+ c.output(KV.of(c.element(), priorPoll.getOutputs()));
+ }
+ if (priorPoll.getWatermark() != null) {
+ c.updateWatermark(priorPoll.getWatermark());
+ }
}
+ return stop();
}
- int numEmittedInThisRound = 0;
- int numTotalPending = tracker.getNumPending();
- int numPreviouslyEmitted = tracker.currentRestriction().completed.size();
- int numTotalKnown = numPreviouslyEmitted + numTotalPending;
- while (true) {
- c.updateWatermark(tracker.getWatermark());
- Map.Entry<HashCode, TimestampedValue<OutputT>> entry = tracker.getNextPending();
- if (entry == null || !tracker.tryClaim(entry.getKey())) {
- break;
- }
- TimestampedValue<OutputT> nextPending = entry.getValue();
- c.outputWithTimestamp(
- KV.of(c.element(), nextPending.getValue()), nextPending.getTimestamp());
- ++numEmittedInThisRound;
- }
+
+ // Poll for additional elements.
+ Instant now = Instant.now();
+ Growth.PollResult<OutputT> res =
+ spec.getPollFn().getClosure().apply(c.element(), wrapProcessContext(c));
+
+ PollingGrowthState<TerminationStateT> pollingRestriction =
+ (PollingGrowthState<TerminationStateT>) currentRestriction;
+ // Produce a poll result that only contains never seen before results.
+ Growth.PollResult<OutputT> newResults =
+ computeNeverSeenBeforeResults(pollingRestriction, res);
+
+ // If we had zero new results, attempt to update the watermark if the poll result
+ // provided a watermark. Otherwise attempt to claim all pending outputs.
LOG.info(
- "{} - emitted {} new results (of {} total known: {} emitted so far, {} more to emit).",
+ "{} - current round of polling took {} ms and returned {} results, "
+ + "of which {} were new.",
c.element(),
- numEmittedInThisRound,
- numTotalKnown,
- numEmittedInThisRound + numPreviouslyEmitted,
- numTotalPending - numEmittedInThisRound);
- Instant watermark = tracker.getWatermark();
- if (watermark != null) {
- // Null means the poll result did not provide a watermark and there were no new elements,
- // so we have no information to update the watermark and should keep it as-is.
- c.updateWatermark(watermark);
+ new Duration(now, Instant.now()).getMillis(),
+ res.getOutputs().size(),
+ newResults.getOutputs().size());
+
+ TerminationStateT terminationState = pollingRestriction.getTerminationState();
+ if (!newResults.getOutputs().isEmpty()) {
+ terminationState =
+ getTerminationCondition().onSeenNewOutput(Instant.now(), terminationState);
}
- // No more pending outputs - future output will come from more polling,
- // unless output is complete or termination condition is reached.
- if (tracker.shouldPollMore()) {
+
+ if (!tracker.tryClaim(KV.of(newResults, terminationState))) {
+ LOG.info("{} - will not emit poll result tryClaim failed.", c.element());
+ return stop();
+ }
+
+ if (!newResults.getOutputs().isEmpty()) {
+ c.output(KV.of(c.element(), newResults.getOutputs()));
+ }
+
+ if (newResults.getWatermark() != null) {
+ c.updateWatermark(newResults.getWatermark());
+ }
+
+ Instant currentTime = Instant.now();
+ if (getTerminationCondition().canStopPolling(currentTime, terminationState)) {
LOG.info(
- "{} - emitted all {} known results so far; will resume polling in {} ms",
+ "{} - told to stop polling by polling function at {} with termination state {}.",
c.element(),
- numTotalKnown,
- spec.getPollInterval().getMillis());
- return resume().withResumeDelay(spec.getPollInterval());
+ currentTime,
+ getTerminationCondition().toString(terminationState));
+ return stop();
}
- return stop();
+
+ if (BoundedWindow.TIMESTAMP_MAX_VALUE.equals(newResults.getWatermark())) {
+ LOG.info("{} - will stop polling, reached max timestamp.", c.element());
+ return stop();
+ }
+
+ LOG.info(
+ "{} - will resume polling in {} ms.", c.element(), spec.getPollInterval().getMillis());
+ return resume().withResumeDelay(spec.getPollInterval());
+ }
+
+ private HashCode hash128(OutputT value) {
+ return Hashing.murmur3_128().hashObject(value, coderFunnel);
+ }
+
+ private Growth.PollResult computeNeverSeenBeforeResults(
+ PollingGrowthState<TerminationStateT> state, Growth.PollResult<OutputT> pollResult) {
+ // Collect results to include as newly pending. Note that the poll result may in theory
+ // contain multiple outputs mapping to the the same output key - we need to ignore duplicates
+ // here already.
+ Map<HashCode, TimestampedValue<OutputT>> newPending = Maps.newHashMap();
+ for (TimestampedValue<OutputT> output : pollResult.getOutputs()) {
+ OutputT value = output.getValue();
+ HashCode hash = hash128(value);
+ if (state.getCompleted().containsKey(hash) || newPending.containsKey(hash)) {
+ continue;
+ }
+ // TODO (https://issues.apache.org/jira/browse/BEAM-2680):
+ // Consider adding only at most N pending elements and ignoring others,
+ // instead relying on future poll rounds to provide them, in order to avoid
+ // blowing up the state. Combined with garbage collection of PollingGrowthState.completed,
+ // this would make the transform scalable to very large poll results.
+ newPending.put(hash, output);
+ }
+
+ return pollResult.withOutputs(
+ Ordering.natural()
+ .onResultOf((TimestampedValue<OutputT> value) -> value.getTimestamp())
+ .sortedCopy(newPending.values()));
}
private Growth.TerminationCondition<InputT, TerminationStateT> getTerminationCondition() {
@@ -784,177 +923,144 @@
}
@GetInitialRestriction
- public GrowthState<OutputT, KeyT, TerminationStateT> getInitialRestriction(InputT element) {
- return new GrowthState<>(getTerminationCondition().forNewInput(Instant.now(), element));
+ public GrowthState getInitialRestriction(InputT element) {
+ return PollingGrowthState.of(getTerminationCondition().forNewInput(Instant.now(), element));
}
@NewTracker
- public GrowthTracker<OutputT, KeyT, TerminationStateT> newTracker(
- GrowthState<OutputT, KeyT, TerminationStateT> restriction) {
- return new GrowthTracker<>(
- outputKeyFn, outputKeyCoder, restriction, getTerminationCondition());
+ public GrowthTracker<OutputT, TerminationStateT> newTracker(GrowthState restriction) {
+ return new GrowthTracker<>(restriction, coderFunnel);
}
@GetRestrictionCoder
@SuppressWarnings({"unchecked", "rawtypes"})
- public Coder<GrowthState<OutputT, KeyT, TerminationStateT>> getRestrictionCoder() {
+ public Coder<GrowthState> getRestrictionCoder() {
return SnappyCoder.of(
GrowthStateCoder.of(outputCoder, (Coder) spec.getTerminationPerInput().getStateCoder()));
}
}
+ /** A base class for all restrictions related to the {@link Growth} SplittableDoFn. */
+ abstract static class GrowthState {}
+
+ /**
+ * Stores the prior pending poll results related to the {@link Growth} SplittableDoFn. Used to
+ * represent the primary restriction during checkpoint which should be replayed if the primary
+ * ever needs to be re-executed.
+ */
+ @AutoValue
@VisibleForTesting
- static class GrowthState<OutputT, KeyT, TerminationStateT> {
+ abstract static class NonPollingGrowthState<OutputT> extends GrowthState {
+ public static <OutputT> NonPollingGrowthState<OutputT> of(Growth.PollResult<OutputT> pending) {
+ return new AutoValue_Watch_NonPollingGrowthState(pending);
+ }
+
+ /**
+ * Contains all pending results to output. Checkpointing/splitting moves "pending" outputs to
+ * the completed set.
+ */
+ public abstract Growth.PollResult<OutputT> getPending();
+ }
+
+ /**
+ * A restriction for the {@link Growth} transform which represents a polling state. The
+ * restriction represents an unbounded amount of work until one of the termination conditions of
+ * the {@link Growth} transform are met.
+ */
+ @AutoValue
+ @VisibleForTesting
+ abstract static class PollingGrowthState<TerminationStateT> extends GrowthState {
+ public static <TerminationStateT> PollingGrowthState<TerminationStateT> of(
+ TerminationStateT terminationState) {
+ return new AutoValue_Watch_PollingGrowthState(ImmutableMap.of(), null, terminationState);
+ }
+
+ public static <TerminationStateT> PollingGrowthState<TerminationStateT> of(
+ ImmutableMap<HashCode, Instant> completed,
+ Instant pollWatermark,
+ TerminationStateT terminationState) {
+ return new AutoValue_Watch_PollingGrowthState(completed, pollWatermark, terminationState);
+ }
+
// Hashes and timestamps of outputs that have already been output and should be omitted
// from future polls. Timestamps are preserved to allow garbage-collecting this state
- // in the future, e.g. dropping elements from "completed" and from addNewAsPending() if their
- // timestamp is more than X behind the watermark.
+ // in the future, e.g. dropping elements from "completed" and from
+ // computeNeverSeenBeforeResults() if their timestamp is more than X behind the watermark.
// As of writing, we don't do this, but preserve the information for forward compatibility
// in case of pipeline update. TODO: do this.
- private final ImmutableMap<HashCode, Instant> completed;
- // Outputs that are known to be present in a poll result, but have not yet been returned
- // from a ProcessElement call, sorted by timestamp to help smooth watermark progress.
- private final ImmutableMap<HashCode, TimestampedValue<OutputT>> pending;
- // If true, processing of this restriction should only output "pending". Otherwise, it should
- // also continue polling.
- private final boolean isOutputComplete;
- // Can be null only if isOutputComplete is true.
- @Nullable private final TerminationStateT terminationState;
- // A lower bound on timestamps of future outputs from PollFn, excluding completed and pending.
- @Nullable private final Instant pollWatermark;
+ public abstract ImmutableMap<HashCode, Instant> getCompleted();
- GrowthState(TerminationStateT terminationState) {
- this.completed = ImmutableMap.of();
- this.pending = ImmutableMap.of();
- this.isOutputComplete = false;
- this.terminationState = checkNotNull(terminationState);
- this.pollWatermark = BoundedWindow.TIMESTAMP_MIN_VALUE;
- }
+ @Nullable
+ public abstract Instant getPollWatermark();
- GrowthState(
- ImmutableMap<HashCode, Instant> completed,
- ImmutableMap<HashCode, TimestampedValue<OutputT>> pending,
- boolean isOutputComplete,
- @Nullable TerminationStateT terminationState,
- @Nullable Instant pollWatermark) {
- if (!isOutputComplete) {
- checkNotNull(terminationState);
- }
- this.completed = completed;
- this.pending = pending;
- this.isOutputComplete = isOutputComplete;
- this.terminationState = terminationState;
- this.pollWatermark = pollWatermark;
- }
-
- public String toString(Growth.TerminationCondition<?, TerminationStateT> terminationCondition) {
- return "GrowthState{"
- + "completed=<"
- + completed.size()
- + " elements>, pending=<"
- + pending.size()
- + " elements"
- + (pending.isEmpty() ? "" : (", earliest " + pending.values().iterator().next()))
- + ">, isOutputComplete="
- + isOutputComplete
- + ", terminationState="
- + terminationCondition.toString(terminationState)
- + ", pollWatermark="
- + pollWatermark
- + '}';
- }
+ public abstract TerminationStateT getTerminationState();
}
@VisibleForTesting
- static class GrowthTracker<OutputT, KeyT, TerminationStateT>
- extends RestrictionTracker<GrowthState<OutputT, KeyT, TerminationStateT>, HashCode> {
+ static class GrowthTracker<OutputT, TerminationStateT>
+ extends RestrictionTracker<GrowthState, KV<Growth.PollResult<OutputT>, TerminationStateT>> {
+
+ static final GrowthState EMPTY_STATE =
+ NonPollingGrowthState.of(new PollResult<>(Collections.emptyList(), null));
+
+ // Used to hash values.
private final Funnel<OutputT> coderFunnel;
- private final Growth.TerminationCondition<?, TerminationStateT> terminationCondition;
+
+ // non-null after first successful tryClaim()
+ @Nullable private Growth.PollResult<OutputT> claimedPollResult;
+ @Nullable private TerminationStateT claimedTerminationState;
+ @Nullable private ImmutableMap<HashCode, Instant> claimedHashes;
// The restriction describing the entire work to be done by the current ProcessElement call.
- // Changes only in checkpoint().
- private GrowthState<OutputT, KeyT, TerminationStateT> state;
+ private GrowthState state;
+ // Whether we should stop claiming poll results.
+ private boolean shouldStop;
- // Mutable state changed by the ProcessElement call itself, and used to compute the primary
- // and residual restrictions in checkpoint().
-
- // Remaining pending outputs; initialized from state.pending (if non-empty) or in
- // addNewAsPending(); drained via tryClaimNextPending().
- private Map<HashCode, TimestampedValue<OutputT>> pending;
- // Outputs that have been claimed in the current ProcessElement call. A prefix of "pending".
- private Map<HashCode, TimestampedValue<OutputT>> claimed = Maps.newLinkedHashMap();
- private boolean isOutputComplete;
- @Nullable private TerminationStateT terminationState;
- @Nullable private Instant pollWatermark;
- private boolean shouldStop = false;
-
- GrowthTracker(
- final SerializableFunction<OutputT, KeyT> keyFn,
- final Coder<KeyT> outputKeyCoder,
- GrowthState<OutputT, KeyT, TerminationStateT> state,
- Growth.TerminationCondition<?, TerminationStateT> terminationCondition) {
- this.coderFunnel =
- (from, into) -> {
- try {
- // Rather than hashing the output itself, hash the output key.
- KeyT outputKey = keyFn.apply(from);
- outputKeyCoder.encode(outputKey, Funnels.asOutputStream(into));
- } catch (IOException e) {
- throw new RuntimeException(e);
- }
- };
- this.terminationCondition = terminationCondition;
+ GrowthTracker(GrowthState state, Funnel<OutputT> coderFunnel) {
this.state = state;
- this.isOutputComplete = state.isOutputComplete;
- this.pollWatermark = state.pollWatermark;
- this.terminationState = state.terminationState;
- this.pending = Maps.newLinkedHashMap(state.pending);
+ this.coderFunnel = coderFunnel;
+ this.shouldStop = false;
}
@Override
- public synchronized GrowthState<OutputT, KeyT, TerminationStateT> currentRestriction() {
+ public GrowthState currentRestriction() {
return state;
}
@Override
- public synchronized GrowthState<OutputT, KeyT, TerminationStateT> checkpoint() {
- checkState(
- !claimed.isEmpty(), "Can't checkpoint before any element was successfully claimed");
-
- // primary should contain exactly the work claimed in the current ProcessElement call - i.e.
- // claimed outputs become pending, and it shouldn't poll again.
- GrowthState<OutputT, KeyT, TerminationStateT> primary =
- new GrowthState<>(
- state.completed /* completed */,
- ImmutableMap.copyOf(claimed) /* pending */,
- true /* isOutputComplete */,
- null /* terminationState */,
- BoundedWindow.TIMESTAMP_MAX_VALUE /* pollWatermark */);
-
+ public GrowthState checkpoint() {
// residual should contain exactly the work *not* claimed in the current ProcessElement call -
- // unclaimed pending outputs plus future polling outputs.
- ImmutableMap.Builder<HashCode, Instant> newCompleted = ImmutableMap.builder();
- newCompleted.putAll(state.completed);
- for (Map.Entry<HashCode, TimestampedValue<OutputT>> claimedOutput : claimed.entrySet()) {
- newCompleted.put(claimedOutput.getKey(), claimedOutput.getValue().getTimestamp());
+ // unclaimed pending outputs or future polling output
+ GrowthState residual;
+
+ if (claimedPollResult == null) {
+ // If we have yet to claim anything then our residual becomes all the work we were meant
+ // to do and we update our current restriction to be empty.
+ residual = state;
+ state = EMPTY_STATE;
+ } else if (state instanceof NonPollingGrowthState) {
+ // Since we have claimed the prior poll, our residual is empty.
+ residual = EMPTY_STATE;
+ } else {
+ // Since we claimed a poll result, our primary becomes the poll result and
+ // our residual becomes everything we have claimed in the past + the current poll result.
+
+ PollingGrowthState<TerminationStateT> currentState =
+ (PollingGrowthState<TerminationStateT>) state;
+ ImmutableMap.Builder<HashCode, Instant> newCompleted = ImmutableMap.builder();
+ newCompleted.putAll(currentState.getCompleted());
+ newCompleted.putAll(claimedHashes);
+ residual =
+ PollingGrowthState.of(
+ newCompleted.build(),
+ Ordering.natural()
+ .nullsFirst()
+ .max(currentState.getPollWatermark(), claimedPollResult.watermark),
+ claimedTerminationState);
+ state = NonPollingGrowthState.of(claimedPollResult);
}
- GrowthState<OutputT, KeyT, TerminationStateT> residual =
- new GrowthState<>(
- newCompleted.build() /* completed */,
- ImmutableMap.copyOf(pending) /* pending */,
- isOutputComplete /* isOutputComplete */,
- terminationState,
- pollWatermark);
- // Morph ourselves into primary, except for "pending" - the current call has already claimed
- // everything from it.
- this.state = primary;
- this.isOutputComplete = primary.isOutputComplete;
- this.pollWatermark = primary.pollWatermark;
- this.terminationState = null;
- this.pending = Maps.newLinkedHashMap();
-
- this.shouldStop = true;
+ shouldStop = true;
return residual;
}
@@ -963,133 +1069,61 @@
}
@Override
- public synchronized void checkDone() throws IllegalStateException {
- if (shouldStop) {
- return;
- }
- checkState(!shouldPollMore(), "Polling is still allowed to continue");
- checkState(pending.isEmpty(), "There are %s unclaimed pending outputs", pending.size());
- }
-
- @VisibleForTesting
- synchronized boolean hasPending() {
- return !pending.isEmpty();
- }
-
- private synchronized int getNumPending() {
- return pending.size();
- }
-
- @VisibleForTesting
- @Nullable
- synchronized Map.Entry<HashCode, TimestampedValue<OutputT>> getNextPending() {
- if (pending.isEmpty()) {
- return null;
- }
- return pending.entrySet().iterator().next();
+ public void checkDone() throws IllegalStateException {
+ checkState(
+ shouldStop, "Missing tryClaim()/checkpoint() call. Expected " + "one or the other.");
}
@Override
- protected synchronized boolean tryClaimImpl(HashCode hash) {
+ public boolean tryClaim(KV<Growth.PollResult<OutputT>, TerminationStateT> pollResult) {
if (shouldStop) {
return false;
}
- checkState(!pending.isEmpty(), "No more unclaimed pending outputs");
- TimestampedValue<OutputT> value = pending.remove(hash);
- checkArgument(value != null, "Attempted to claim unknown hash %s", hash);
- claimed.put(hash, value);
+
+ ImmutableMap.Builder<HashCode, Instant> newClaimedHashesBuilder = ImmutableMap.builder();
+ for (TimestampedValue<OutputT> value : pollResult.getKey().getOutputs()) {
+ HashCode hash = hash128(value.getValue());
+ newClaimedHashesBuilder.put(hash, value.getTimestamp());
+ }
+ ImmutableMap<HashCode, Instant> newClaimedHashes = newClaimedHashesBuilder.build();
+
+ if (state instanceof PollingGrowthState) {
+ // If we have previously claimed one of these hashes then return false.
+ if (!Collections.disjoint(
+ newClaimedHashes.keySet(), ((PollingGrowthState) state).getCompleted().keySet())) {
+ return false;
+ }
+ } else {
+ Set<HashCode> expectedHashesToClaim = new HashSet<>();
+ for (TimestampedValue<OutputT> value :
+ ((NonPollingGrowthState<OutputT>) state).getPending().getOutputs()) {
+ expectedHashesToClaim.add(hash128(value.getValue()));
+ }
+ // We expect to claim the entire poll result from a NonPollingGrowthState. This is
+ // stricter then currently required and could be relaxed if this tracker supported
+ // splitting a NonPollingGrowthState into two smaller NonPollingGrowthStates.
+ if (!expectedHashesToClaim.equals(newClaimedHashes.keySet())) {
+ return false;
+ }
+ }
+
+ // Only allow claiming a single poll result at a time.
+ shouldStop = true;
+ claimedPollResult = pollResult.getKey();
+ claimedTerminationState = pollResult.getValue();
+ claimedHashes = newClaimedHashes;
+
return true;
}
- @VisibleForTesting
- synchronized boolean shouldPollMore() {
- return !isOutputComplete
- && !terminationCondition.canStopPolling(Instant.now(), terminationState);
- }
-
- @VisibleForTesting
- synchronized int addNewAsPending(Growth.PollResult<OutputT> pollResult) {
- checkState(
- state.pending.isEmpty(),
- "Should have drained all old pending outputs before adding new, "
- + "but there are %s old pending outputs",
- state.pending.size());
- // Collect results to include as newly pending. Note that the poll result may in theory
- // contain multiple outputs mapping to the the same output key - we need to ignore duplicates
- // here already.
- Map<HashCode, TimestampedValue<OutputT>> newPending = Maps.newHashMap();
- for (TimestampedValue<OutputT> output : pollResult.getOutputs()) {
- OutputT value = output.getValue();
- HashCode hash = hash128(value);
- if (state.completed.containsKey(hash) || newPending.containsKey(hash)) {
- continue;
- }
- // TODO (https://issues.apache.org/jira/browse/BEAM-2680):
- // Consider adding only at most N pending elements and ignoring others,
- // instead relying on future poll rounds to provide them, in order to avoid
- // blowing up the state. Combined with garbage collection of GrowthState.completed,
- // this would make the transform scalable to very large poll results.
- newPending.put(hash, TimestampedValue.of(value, output.getTimestamp()));
- }
- if (!newPending.isEmpty()) {
- terminationState = terminationCondition.onSeenNewOutput(Instant.now(), terminationState);
- }
-
- List<Map.Entry<HashCode, TimestampedValue<OutputT>>> sortedPending =
- Ordering.natural()
- .onResultOf(
- (Map.Entry<HashCode, TimestampedValue<OutputT>> entry) ->
- entry.getValue().getTimestamp())
- .sortedCopy(newPending.entrySet());
- this.pending = Maps.newLinkedHashMap();
- for (Map.Entry<HashCode, TimestampedValue<OutputT>> entry : sortedPending) {
- this.pending.put(entry.getKey(), entry.getValue());
- }
- // If poll result doesn't provide a watermark, assume that future new outputs may
- // arrive with about the same timestamps as the current new outputs.
- if (pollResult.getWatermark() != null) {
- this.pollWatermark = pollResult.getWatermark();
- } else if (!pending.isEmpty()) {
- this.pollWatermark = pending.values().iterator().next().getTimestamp();
- }
- if (BoundedWindow.TIMESTAMP_MAX_VALUE.equals(pollWatermark)) {
- isOutputComplete = true;
- }
- return pending.size();
- }
-
- @VisibleForTesting
- synchronized Instant getWatermark() {
- // Future elements that can be claimed in this restriction come either from
- // "pending" or from future polls, so the total watermark is
- // min(watermark for future polling, earliest remaining pending element)
- return Ordering.natural()
- .nullsLast()
- .min(
- pollWatermark,
- pending.isEmpty() ? null : pending.values().iterator().next().getTimestamp());
- }
-
@Override
- public synchronized String toString() {
- return "GrowthTracker{"
- + "state="
- + state.toString(terminationCondition)
- + ", pending=<"
- + pending.size()
- + " elements"
- + (pending.isEmpty() ? "" : (", earliest " + pending.values().iterator().next()))
- + ">, claimed=<"
- + claimed.size()
- + " elements>, isOutputComplete="
- + isOutputComplete
- + ", terminationState="
- + terminationState
- + ", pollWatermark="
- + pollWatermark
- + ", shouldStop="
- + shouldStop
- + '}';
+ public String toString() {
+ return MoreObjects.toStringHelper(this)
+ .add("state", state)
+ .add("pollResult", claimedPollResult)
+ .add("terminationState", claimedTerminationState)
+ .add("shouldStop", shouldStop)
+ .toString();
}
}
@@ -1118,64 +1152,83 @@
}
}
- private static class GrowthStateCoder<OutputT, KeyT, TerminationStateT>
- extends StructuredCoder<GrowthState<OutputT, KeyT, TerminationStateT>> {
- public static <OutputT, KeyT, TerminationStateT>
- GrowthStateCoder<OutputT, KeyT, TerminationStateT> of(
- Coder<OutputT> outputCoder, Coder<TerminationStateT> terminationStateCoder) {
+ static class GrowthStateCoder<OutputT, TerminationStateT> extends StructuredCoder<GrowthState> {
+
+ private static final int POLLING_GROWTH_STATE = 0;
+ private static final int NON_POLLING_GROWTH_STATE = 1;
+
+ public static <OutputT, TerminationStateT> GrowthStateCoder<OutputT, TerminationStateT> of(
+ Coder<OutputT> outputCoder, Coder<TerminationStateT> terminationStateCoder) {
return new GrowthStateCoder<>(outputCoder, terminationStateCoder);
}
- private static final Coder<Boolean> BOOLEAN_CODER = BooleanCoder.of();
- private static final Coder<Instant> INSTANT_CODER = NullableCoder.of(InstantCoder.of());
- private static final Coder<HashCode> HASH_CODE_CODER = HashCode128Coder.of();
+ private static final MapCoder<HashCode, Instant> COMPLETED_CODER =
+ MapCoder.of(HashCode128Coder.of(), InstantCoder.of());
+ private static final Coder<Instant> NULLABLE_INSTANT_CODER =
+ NullableCoder.of(InstantCoder.of());
private final Coder<OutputT> outputCoder;
- private final Coder<Map<HashCode, Instant>> completedCoder;
- private final Coder<TimestampedValue<OutputT>> timestampedOutputCoder;
+ private final Coder<List<TimestampedValue<OutputT>>> timestampedOutputCoder;
private final Coder<TerminationStateT> terminationStateCoder;
private GrowthStateCoder(
Coder<OutputT> outputCoder, Coder<TerminationStateT> terminationStateCoder) {
this.outputCoder = outputCoder;
this.terminationStateCoder = terminationStateCoder;
- this.completedCoder = MapCoder.of(HASH_CODE_CODER, INSTANT_CODER);
- this.timestampedOutputCoder = TimestampedValue.TimestampedValueCoder.of(outputCoder);
+ this.timestampedOutputCoder =
+ ListCoder.of(TimestampedValue.TimestampedValueCoder.of(outputCoder));
}
@Override
- public void encode(GrowthState<OutputT, KeyT, TerminationStateT> value, OutputStream os)
+ public void encode(GrowthState value, OutputStream os) throws IOException {
+ if (value instanceof PollingGrowthState) {
+ VarInt.encode(POLLING_GROWTH_STATE, os);
+ encodePollingGrowthState((PollingGrowthState<TerminationStateT>) value, os);
+ } else if (value instanceof NonPollingGrowthState) {
+ VarInt.encode(NON_POLLING_GROWTH_STATE, os);
+ encodeNonPollingGrowthState((NonPollingGrowthState<OutputT>) value, os);
+ } else {
+ throw new IOException("Unknown growth state: " + value);
+ }
+ }
+
+ private void encodePollingGrowthState(
+ PollingGrowthState<TerminationStateT> value, OutputStream os) throws IOException {
+ terminationStateCoder.encode(value.getTerminationState(), os);
+ NULLABLE_INSTANT_CODER.encode(value.getPollWatermark(), os);
+ COMPLETED_CODER.encode(value.getCompleted(), os);
+ }
+
+ private void encodeNonPollingGrowthState(NonPollingGrowthState<OutputT> value, OutputStream os)
throws IOException {
- completedCoder.encode(value.completed, os);
- VarIntCoder.of().encode(value.pending.size(), os);
- for (Map.Entry<HashCode, TimestampedValue<OutputT>> entry : value.pending.entrySet()) {
- HASH_CODE_CODER.encode(entry.getKey(), os);
- timestampedOutputCoder.encode(entry.getValue(), os);
- }
- BOOLEAN_CODER.encode(value.isOutputComplete, os);
- terminationStateCoder.encode(value.terminationState, os);
- INSTANT_CODER.encode(value.pollWatermark, os);
+ NULLABLE_INSTANT_CODER.encode(value.getPending().getWatermark(), os);
+ timestampedOutputCoder.encode(value.getPending().getOutputs(), os);
}
@Override
- public GrowthState<OutputT, KeyT, TerminationStateT> decode(InputStream is) throws IOException {
- Map<HashCode, Instant> completed = completedCoder.decode(is);
- int numPending = VarIntCoder.of().decode(is);
- ImmutableMap.Builder<HashCode, TimestampedValue<OutputT>> pending = ImmutableMap.builder();
- for (int i = 0; i < numPending; ++i) {
- HashCode hash = HASH_CODE_CODER.decode(is);
- TimestampedValue<OutputT> output = timestampedOutputCoder.decode(is);
- pending.put(hash, output);
+ public GrowthState decode(InputStream is) throws IOException {
+ int type = VarInt.decodeInt(is);
+ switch (type) {
+ case NON_POLLING_GROWTH_STATE:
+ return decodeNonPollingGrowthState(is);
+ case POLLING_GROWTH_STATE:
+ return decodePollingGrowthState(is);
+ default:
+ throw new IOException("Unknown growth state type " + type);
}
- boolean isOutputComplete = BOOLEAN_CODER.decode(is);
+ }
+
+ private GrowthState decodeNonPollingGrowthState(InputStream is) throws IOException {
+ Instant watermark = NULLABLE_INSTANT_CODER.decode(is);
+ List<TimestampedValue<OutputT>> values = timestampedOutputCoder.decode(is);
+ return NonPollingGrowthState.of(new Growth.PollResult<>(values, watermark));
+ }
+
+ private GrowthState decodePollingGrowthState(InputStream is) throws IOException {
TerminationStateT terminationState = terminationStateCoder.decode(is);
- Instant pollWatermark = INSTANT_CODER.decode(is);
- return new GrowthState<>(
- ImmutableMap.copyOf(completed),
- pending.build(),
- isOutputComplete,
- terminationState,
- pollWatermark);
+ Instant watermark = NULLABLE_INSTANT_CODER.decode(is);
+ Map<HashCode, Instant> completed = COMPLETED_CODER.decode(is);
+ return PollingGrowthState.of(ImmutableMap.copyOf(completed), watermark, terminationState);
}
@Override
diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnInvoker.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnInvoker.java
index 27db217..438a918 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnInvoker.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnInvoker.java
@@ -89,7 +89,7 @@
/** Invoke the {@link DoFn.NewTracker} method on the bound {@link DoFn}. */
@SuppressWarnings("TypeParameterUnusedInFormals")
- <RestrictionT, TrackerT extends RestrictionTracker<RestrictionT, ?>> TrackerT invokeNewTracker(
+ <RestrictionT, PositionT> RestrictionTracker<RestrictionT, PositionT> invokeNewTracker(
RestrictionT restriction);
/** Get the bound {@link DoFn}. */
diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignatures.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignatures.java
index c1aa7ae..9889adc 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignatures.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignatures.java
@@ -552,9 +552,6 @@
ErrorReporter processElementErrors =
errors.forMethod(DoFn.ProcessElement.class, processElement.targetMethod());
- final TypeDescriptor<?> trackerT;
- final String originOfTrackerT;
-
List<String> missingRequiredMethods = new ArrayList<>();
if (getInitialRestriction == null) {
missingRequiredMethods.add("@" + DoFn.GetInitialRestriction.class.getSimpleName());
@@ -564,27 +561,11 @@
&& getInitialRestriction
.restrictionT()
.isSubtypeOf(TypeDescriptor.of(HasDefaultTracker.class))) {
- trackerT =
- getInitialRestriction
- .restrictionT()
- .resolveType(HasDefaultTracker.class.getTypeParameters()[1]);
- originOfTrackerT =
- String.format(
- "restriction type %s of @%s method %s",
- formatType(getInitialRestriction.restrictionT()),
- DoFn.GetInitialRestriction.class.getSimpleName(),
- format(getInitialRestriction.targetMethod()));
+ // no-op we are using the annotation @HasDefaultTracker
} else {
missingRequiredMethods.add("@" + DoFn.NewTracker.class.getSimpleName());
- trackerT = null;
- originOfTrackerT = null;
}
} else {
- trackerT = newTracker.trackerT();
- originOfTrackerT =
- String.format(
- "%s method %s",
- DoFn.NewTracker.class.getSimpleName(), format(newTracker.targetMethod()));
ErrorReporter getInitialRestrictionErrors =
errors.forMethod(DoFn.GetInitialRestriction.class, getInitialRestriction.targetMethod());
TypeDescriptor<?> restrictionT = getInitialRestriction.restrictionT();
@@ -607,11 +588,9 @@
errors.forMethod(DoFn.GetInitialRestriction.class, getInitialRestriction.targetMethod());
TypeDescriptor<?> restrictionT = getInitialRestriction.restrictionT();
processElementErrors.checkArgument(
- processElement.trackerT().equals(trackerT),
- "Has tracker type %s, but the DoFn's tracker type was inferred as %s from %s",
- formatType(processElement.trackerT()),
- trackerT,
- originOfTrackerT);
+ processElement.trackerT().getRawType().equals(RestrictionTracker.class),
+ "Has tracker type %s, but the DoFn's tracker type must be of type RestrictionTracker.",
+ formatType(processElement.trackerT()));
if (getRestrictionCoder != null) {
getInitialRestrictionErrors.checkArgument(
diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.java
index 6f72d84..44f2f0b 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.java
@@ -56,12 +56,12 @@
}
@Override
- public synchronized ByteKeyRange currentRestriction() {
+ public ByteKeyRange currentRestriction() {
return range;
}
@Override
- public synchronized ByteKeyRange checkpoint() {
+ public ByteKeyRange checkpoint() {
// If we haven't done any work, we should return the original range we were processing
// as the checkpoint.
if (lastAttemptedKey == null) {
@@ -99,7 +99,7 @@
* current {@link ByteKeyRange} of this tracker.
*/
@Override
- protected synchronized boolean tryClaimImpl(ByteKey key) {
+ public boolean tryClaim(ByteKey key) {
// Handle claiming the end of range EMPTY key
if (key.isEmpty()) {
checkArgument(
@@ -132,7 +132,7 @@
}
@Override
- public synchronized void checkDone() throws IllegalStateException {
+ public void checkDone() throws IllegalStateException {
// Handle checking the empty range which is implicitly done.
// This case can occur if the range tracker is checkpointed before any keys have been claimed
// or if the range tracker is checkpointed once the range is done.
@@ -162,7 +162,7 @@
}
@Override
- public synchronized String toString() {
+ public String toString() {
return MoreObjects.toStringHelper(this)
.add("range", range)
.add("lastClaimedKey", lastClaimedKey)
@@ -184,7 +184,7 @@
private static final byte[] ZERO_BYTE_ARRAY = new byte[] {0};
@Override
- public synchronized Backlog getBacklog() {
+ public Backlog getBacklog() {
// Return 0 for the empty range which is implicitly done.
// This case can occur if the range tracker is checkpointed before any keys have been claimed
// or if the range tracker is checkpointed once the range is done.
diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java
index 549aa9b..9d90c69 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java
@@ -41,12 +41,12 @@
}
@Override
- public synchronized OffsetRange currentRestriction() {
+ public OffsetRange currentRestriction() {
return range;
}
@Override
- public synchronized OffsetRange checkpoint() {
+ public OffsetRange checkpoint() {
checkState(
lastClaimedOffset != null, "Can't checkpoint before any offset was successfully claimed");
OffsetRange res = new OffsetRange(lastClaimedOffset + 1, range.getTo());
@@ -63,7 +63,7 @@
* current {@link OffsetRange} of this tracker (in that case this operation is a no-op).
*/
@Override
- protected synchronized boolean tryClaimImpl(Long i) {
+ public boolean tryClaim(Long i) {
checkArgument(
lastAttemptedOffset == null || i > lastAttemptedOffset,
"Trying to claim offset %s while last attempted was %s",
@@ -81,7 +81,7 @@
}
@Override
- public synchronized void checkDone() throws IllegalStateException {
+ public void checkDone() throws IllegalStateException {
checkState(
lastAttemptedOffset >= range.getTo() - 1,
"Last attempted offset was %s in range %s, claiming work in [%s, %s) was not attempted",
@@ -101,7 +101,7 @@
}
@Override
- public synchronized Backlog getBacklog() {
+ public Backlog getBacklog() {
// If we have never attempted an offset, we return the length of the entire range.
if (lastAttemptedOffset == null) {
return Backlog.of(BigDecimal.valueOf(range.getTo() - range.getFrom()));
diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/RestrictionTracker.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/RestrictionTracker.java
index c697f5a..1ed6a97 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/RestrictionTracker.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/RestrictionTracker.java
@@ -17,41 +17,13 @@
*/
package org.apache.beam.sdk.transforms.splittabledofn;
-import static org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkNotNull;
-import static org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkState;
-
-import javax.annotation.Nullable;
-import org.apache.beam.sdk.annotations.Internal;
import org.apache.beam.sdk.transforms.DoFn;
/**
- * Manages concurrent access to the restriction and keeps track of its claimed part for a <a
+ * Manages access to the restriction and keeps track of its claimed part for a <a
* href="https://s.apache.org/splittable-do-fn">splittable</a> {@link DoFn}.
*/
public abstract class RestrictionTracker<RestrictionT, PositionT> {
- /** Internal interface allowing a runner to observe the calls to {@link #tryClaim}. */
- @Internal
- public interface ClaimObserver<PositionT> {
- /** Called when {@link #tryClaim} returns true. */
- void onClaimed(PositionT position);
-
- /** Called when {@link #tryClaim} returns false. */
- void onClaimFailed(PositionT position);
- }
-
- @Nullable private ClaimObserver<PositionT> claimObserver;
-
- /**
- * Sets a {@link ClaimObserver} to be invoked on every call to {@link #tryClaim}. Internal:
- * intended only for runner authors.
- */
- @Internal
- public void setClaimObserver(ClaimObserver<PositionT> claimObserver) {
- checkNotNull(claimObserver, "claimObserver");
- checkState(this.claimObserver == null, "A claim observer has already been set");
- this.claimObserver = claimObserver;
- }
-
/**
* Attempts to claim the block of work in the current restriction identified by the given
* position.
@@ -65,27 +37,8 @@
* call to this method).
* <li>{@link RestrictionTracker#checkDone} MUST succeed.
* </ul>
- *
- * <p>Under the hood, calls {@link #tryClaimImpl} and notifies {@link ClaimObserver} of the
- * result.
*/
- public final boolean tryClaim(PositionT position) {
- if (tryClaimImpl(position)) {
- if (claimObserver != null) {
- claimObserver.onClaimed(position);
- }
- return true;
- } else {
- if (claimObserver != null) {
- claimObserver.onClaimFailed(position);
- }
- return false;
- }
- }
-
- /** Tracker-specific implementation of {@link #tryClaim}. */
- @Internal
- protected abstract boolean tryClaimImpl(PositionT position);
+ public abstract boolean tryClaim(PositionT position);
/**
* Returns a restriction accurately describing the full range of work the current {@link
@@ -102,8 +55,7 @@
* work: the old value of {@link #currentRestriction} is equivalent to the new value and the
* return value of this method combined.
*
- * <p>Must be called at most once on a given object. Must not be called before the first
- * successful {@link #tryClaim} call.
+ * <p>Must be called at most once on a given object.
*/
public abstract RestrictionT checkpoint();
diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/options/PipelineOptionsFactoryTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/options/PipelineOptionsFactoryTest.java
index 2554b3b..5f91baa 100644
--- a/sdks/java/core/src/test/java/org/apache/beam/sdk/options/PipelineOptionsFactoryTest.java
+++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/options/PipelineOptionsFactoryTest.java
@@ -1437,11 +1437,14 @@
private interface NonPublicPipelineOptions extends PipelineOptions {}
@Test
- public void testNonPublicInterfaceLogsWarning() throws Exception {
+ public void testNonPublicInterfaceThrowsException() throws Exception {
+ expectedException.expect(IllegalArgumentException.class);
+ expectedException.expectMessage(
+ "Please mark non-public interface "
+ + NonPublicPipelineOptions.class.getName()
+ + " as public.");
+
PipelineOptionsFactory.as(NonPublicPipelineOptions.class);
- // Make sure we print the name of the class.
- expectedLogs.verifyWarn(NonPublicPipelineOptions.class.getName());
- expectedLogs.verifyWarn("all non-public interfaces to be in the same package");
}
/** A test interface containing all supported List return types. */
diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java
index 9a6f9b3..c86f8f6 100644
--- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java
+++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java
@@ -47,7 +47,7 @@
import org.apache.beam.sdk.testing.ValidatesRunner;
import org.apache.beam.sdk.transforms.DoFn.BoundedPerElement;
import org.apache.beam.sdk.transforms.DoFn.UnboundedPerElement;
-import org.apache.beam.sdk.transforms.splittabledofn.OffsetRangeTracker;
+import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker;
import org.apache.beam.sdk.transforms.windowing.FixedWindows;
import org.apache.beam.sdk.transforms.windowing.IntervalWindow;
import org.apache.beam.sdk.transforms.windowing.Never;
@@ -80,7 +80,8 @@
static class PairStringWithIndexToLengthBase extends DoFn<String, KV<String, Integer>> {
@ProcessElement
- public ProcessContinuation process(ProcessContext c, OffsetRangeTracker tracker) {
+ public ProcessContinuation process(
+ ProcessContext c, RestrictionTracker<OffsetRange, Long> tracker) {
for (long i = tracker.currentRestriction().getFrom(), numIterations = 0;
tracker.tryClaim(i);
++i, ++numIterations) {
@@ -238,7 +239,8 @@
}
@ProcessElement
- public ProcessContinuation processElement(ProcessContext c, OffsetRangeTracker tracker) {
+ public ProcessContinuation processElement(
+ ProcessContext c, RestrictionTracker<OffsetRange, Long> tracker) {
int[] blockStarts = {-1, 0, 12, 123, 1234, 12345, 34567, MAX_INDEX};
int trueStart = snapToNextBlock((int) tracker.currentRestriction().getFrom(), blockStarts);
for (int i = trueStart, numIterations = 1;
@@ -317,7 +319,7 @@
}
@ProcessElement
- public void process(ProcessContext c, OffsetRangeTracker tracker) {
+ public void process(ProcessContext c, RestrictionTracker<OffsetRange, Long> tracker) {
checkState(tracker.tryClaim(tracker.currentRestriction().getFrom()));
String side = c.sideInput(sideInput);
c.output(side + ":" + c.element());
@@ -449,7 +451,8 @@
}
@ProcessElement
- public ProcessContinuation processElement(ProcessContext c, OffsetRangeTracker tracker) {
+ public ProcessContinuation processElement(
+ ProcessContext c, RestrictionTracker<OffsetRange, Long> tracker) {
int[] blockStarts = {-1, 0, 12, 123, 1234, 12345, 34567, MAX_INDEX};
int trueStart = snapToNextBlock((int) tracker.currentRestriction().getFrom(), blockStarts);
for (int i = trueStart, numIterations = 1;
@@ -571,7 +574,7 @@
}
@ProcessElement
- public void process(ProcessContext c, OffsetRangeTracker tracker) {
+ public void process(ProcessContext c, RestrictionTracker<OffsetRange, Long> tracker) {
checkState(tracker.tryClaim(tracker.currentRestriction().getFrom()));
c.output("main:" + c.element());
c.output(additionalOutput, "additional:" + c.element());
@@ -712,7 +715,7 @@
}
@ProcessElement
- public void processElement(ProcessContext c, OffsetRangeTracker tracker) {
+ public void processElement(ProcessContext c, RestrictionTracker<OffsetRange, Long> tracker) {
assertEquals(State.INSIDE_BUNDLE, state);
assertTrue(tracker.tryClaim(0L));
c.output(c.element());
@@ -774,7 +777,8 @@
ParDo.of(
new DoFn<String, String>() {
@ProcessElement
- public void process(@Element String element, OffsetRangeTracker tracker) {
+ public void process(
+ @Element String element, RestrictionTracker<OffsetRange, Long> tracker) {
// Doesn't matter
}
@@ -792,7 +796,7 @@
new DoFn<String, String>() {
@ProcessElement
public ProcessContinuation process(
- @Element String element, OffsetRangeTracker tracker) {
+ @Element String element, RestrictionTracker<OffsetRange, Long> tracker) {
return stop();
}
diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/WatchTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/WatchTest.java
index b762161..ad7813b 100644
--- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/WatchTest.java
+++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/WatchTest.java
@@ -22,26 +22,29 @@
import static org.apache.beam.sdk.transforms.Watch.Growth.allOf;
import static org.apache.beam.sdk.transforms.Watch.Growth.eitherOf;
import static org.apache.beam.sdk.transforms.Watch.Growth.never;
+import static org.hamcrest.MatcherAssert.assertThat;
+import static org.hamcrest.Matchers.containsInAnyOrder;
import static org.hamcrest.Matchers.greaterThan;
import static org.joda.time.Duration.standardSeconds;
import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertFalse;
-import static org.junit.Assert.assertThat;
+import static org.junit.Assert.assertNull;
import static org.junit.Assert.assertTrue;
import static org.junit.Assert.fail;
+import java.io.IOException;
import java.io.Serializable;
import java.util.Arrays;
-import java.util.Collections;
import java.util.List;
-import java.util.Map;
import java.util.UUID;
import java.util.stream.Collectors;
import java.util.stream.StreamSupport;
import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.Coder;
import org.apache.beam.sdk.coders.KvCoder;
import org.apache.beam.sdk.coders.StringUtf8Coder;
import org.apache.beam.sdk.coders.VarIntCoder;
+import org.apache.beam.sdk.testing.CoderProperties;
import org.apache.beam.sdk.testing.NeedsRunner;
import org.apache.beam.sdk.testing.PAssert;
import org.apache.beam.sdk.testing.TestPipeline;
@@ -50,16 +53,21 @@
import org.apache.beam.sdk.transforms.Watch.Growth.PollResult;
import org.apache.beam.sdk.transforms.Watch.GrowthState;
import org.apache.beam.sdk.transforms.Watch.GrowthTracker;
-import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
+import org.apache.beam.sdk.transforms.Watch.NonPollingGrowthState;
+import org.apache.beam.sdk.transforms.Watch.PollingGrowthState;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.values.PCollectionView;
import org.apache.beam.sdk.values.TimestampedValue;
import org.apache.beam.vendor.guava.v20_0.com.google.common.base.Function;
+import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableMap;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Lists;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Ordering;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Sets;
+import org.apache.beam.vendor.guava.v20_0.com.google.common.hash.Funnel;
+import org.apache.beam.vendor.guava.v20_0.com.google.common.hash.Funnels;
import org.apache.beam.vendor.guava.v20_0.com.google.common.hash.HashCode;
+import org.apache.beam.vendor.guava.v20_0.com.google.common.hash.Hashing;
import org.joda.time.Duration;
import org.joda.time.Instant;
import org.joda.time.ReadableDuration;
@@ -72,6 +80,7 @@
/** Tests for {@link Watch}. */
@RunWith(JUnit4.class)
public class WatchTest implements Serializable {
+
@Rule public transient TestPipeline p = TestPipeline.create();
@Test
@@ -337,15 +346,11 @@
}
};
- Ordering<TimestampedValue<Integer>> byValue =
- Ordering.natural().onResultOf(extractValueFn);
Ordering<TimestampedValue<Integer>> byTimestamp =
Ordering.natural().onResultOf(extractTimestampFn);
// New outputs appear in timestamp order because each output's assigned timestamp
// is Instant.now() at the time of poll.
- assertTrue(
- "Outputs must be in timestamp order",
- byTimestamp.isOrdered(byValue.sortedCopy(outputs)));
+ assertTrue("Outputs must be in timestamp order", byTimestamp.isOrdered(outputs));
assertEquals(
"Yields all expected values",
numResults,
@@ -368,11 +373,31 @@
p.run();
}
+ @Test
+ public void testCoder() throws Exception {
+ GrowthState pollingState =
+ PollingGrowthState.of(
+ ImmutableMap.of(
+ HashCode.fromString("0123456789abcdef0123456789abcdef"), Instant.now(),
+ HashCode.fromString("01230123012301230123012301230123"), Instant.now()),
+ Instant.now(),
+ "STATE");
+ GrowthState nonPollingState =
+ NonPollingGrowthState.of(
+ Growth.PollResult.incomplete(Instant.now(), Arrays.asList("A", "B")));
+ Coder<GrowthState> coder =
+ Watch.GrowthStateCoder.of(StringUtf8Coder.of(), StringUtf8Coder.of());
+
+ CoderProperties.coderDecodeEncodeEqual(coder, pollingState);
+ CoderProperties.coderDecodeEncodeEqual(coder, nonPollingState);
+ }
+
/**
* Gradually emits all items from the given list, pairing each one with a UUID that identifies the
* round of polling, so a client can check how many rounds of polling there were.
*/
private static class TimedPollFn<InputT, OutputT> extends PollFn<InputT, OutputT> {
+
private final Instant baseTime;
private final List<OutputT> outputs;
private final Duration timeToOutputEverything;
@@ -484,319 +509,177 @@
assertTrue(c.canStopPolling(now.plus(standardSeconds(12)), state));
}
- private static GrowthTracker<String, String, Integer> newTracker(
- GrowthState<String, String, Integer> state) {
- return new GrowthTracker<>(
- SerializableFunctions.identity(), StringUtf8Coder.of(), state, never());
+ private static GrowthTracker<String, Integer> newTracker(GrowthState state) {
+ Funnel<String> coderFunnel =
+ (from, into) -> {
+ try {
+ StringUtf8Coder.of().encode(from, Funnels.asOutputStream(into));
+ } catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ };
+ return new GrowthTracker<>(state, coderFunnel);
}
- private static GrowthTracker<String, String, Integer> newTracker() {
- return newTracker(new GrowthState<>(never().forNewInput(Instant.now(), null)));
+ private static HashCode hash128(String value) {
+ Funnel<String> coderFunnel =
+ (from, into) -> {
+ try {
+ StringUtf8Coder.of().encode(from, Funnels.asOutputStream(into));
+ } catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ };
+ return Hashing.murmur3_128().hashObject(value, coderFunnel);
}
- private String tryClaimNextPending(GrowthTracker<String, ?, ?> tracker) {
- assertTrue(tracker.hasPending());
- Map.Entry<HashCode, TimestampedValue<String>> entry = tracker.getNextPending();
- tracker.tryClaim(entry.getKey());
- return entry.getValue().getValue();
+ private static GrowthTracker<String, Integer> newPollingGrowthTracker() {
+ return newTracker(PollingGrowthState.of(never().forNewInput(Instant.now(), null)));
}
@Test
- public void testGrowthTrackerCheckpointNonEmpty() {
+ public void testPollingGrowthTrackerCheckpointNonEmpty() {
Instant now = Instant.now();
- GrowthTracker<String, String, Integer> tracker = newTracker();
- tracker.addNewAsPending(
+ GrowthTracker<String, Integer> tracker = newPollingGrowthTracker();
+
+ PollResult<String> claim =
PollResult.incomplete(
Arrays.asList(
TimestampedValue.of("d", now.plus(standardSeconds(4))),
TimestampedValue.of("c", now.plus(standardSeconds(3))),
TimestampedValue.of("a", now.plus(standardSeconds(1))),
TimestampedValue.of("b", now.plus(standardSeconds(2)))))
- .withWatermark(now.plus(standardSeconds(7))));
+ .withWatermark(now.plus(standardSeconds(7)));
- assertEquals(now.plus(standardSeconds(1)), tracker.getWatermark());
- assertEquals("a", tryClaimNextPending(tracker));
- assertEquals("b", tryClaimNextPending(tracker));
- assertTrue(tracker.hasPending());
- assertEquals(now.plus(standardSeconds(3)), tracker.getWatermark());
+ assertTrue(tracker.tryClaim(KV.of(claim, 1 /* termination state */)));
- GrowthTracker<String, String, Integer> residualTracker = newTracker(tracker.checkpoint());
- GrowthTracker<String, String, Integer> primaryTracker =
- newTracker(tracker.currentRestriction());
+ PollingGrowthState<Integer> residual = (PollingGrowthState<Integer>) tracker.checkpoint();
+ NonPollingGrowthState<String> primary =
+ (NonPollingGrowthState<String>) tracker.currentRestriction();
+ tracker.checkDone();
// Verify primary: should contain what the current tracker claimed, and nothing else.
- assertEquals(now.plus(standardSeconds(1)), primaryTracker.getWatermark());
- assertEquals("a", tryClaimNextPending(primaryTracker));
- assertEquals("b", tryClaimNextPending(primaryTracker));
- assertFalse(primaryTracker.hasPending());
- assertFalse(primaryTracker.shouldPollMore());
- // No more pending elements in primary restriction, and no polling.
- primaryTracker.checkDone();
- assertEquals(BoundedWindow.TIMESTAMP_MAX_VALUE, primaryTracker.getWatermark());
+ assertEquals(claim, primary.getPending());
// Verify residual: should contain what the current tracker didn't claim.
- assertEquals(now.plus(standardSeconds(3)), residualTracker.getWatermark());
- assertEquals("c", tryClaimNextPending(residualTracker));
- assertEquals("d", tryClaimNextPending(residualTracker));
- assertFalse(residualTracker.hasPending());
- assertTrue(residualTracker.shouldPollMore());
- // No more pending elements in residual restriction, but poll watermark still holds.
- assertEquals(now.plus(standardSeconds(7)), residualTracker.getWatermark());
-
- // Verify current tracker: it was checkpointed, so should contain nothing else.
- assertFalse(tracker.hasPending());
- tracker.checkDone();
- assertFalse(tracker.shouldPollMore());
- assertEquals(BoundedWindow.TIMESTAMP_MAX_VALUE, tracker.getWatermark());
+ assertEquals(now.plus(standardSeconds(7)), residual.getPollWatermark());
+ assertThat(
+ residual.getCompleted().keySet(),
+ containsInAnyOrder(hash128("a"), hash128("b"), hash128("c"), hash128("d")));
+ assertEquals(1, (int) residual.getTerminationState());
}
@Test
- public void testGrowthTrackerOutputFullyBeforeCheckpointIncomplete() {
- Instant now = Instant.now();
- GrowthTracker<String, String, Integer> tracker = newTracker();
- tracker.addNewAsPending(
- PollResult.incomplete(
- Arrays.asList(
- TimestampedValue.of("d", now.plus(standardSeconds(4))),
- TimestampedValue.of("c", now.plus(standardSeconds(3))),
- TimestampedValue.of("a", now.plus(standardSeconds(1))),
- TimestampedValue.of("b", now.plus(standardSeconds(2)))))
- .withWatermark(now.plus(standardSeconds(7))));
+ public void testPollingGrowthTrackerCheckpointEmpty() {
+ GrowthTracker<String, Integer> tracker = newPollingGrowthTracker();
- assertEquals("a", tryClaimNextPending(tracker));
- assertEquals("b", tryClaimNextPending(tracker));
- assertEquals("c", tryClaimNextPending(tracker));
- assertEquals("d", tryClaimNextPending(tracker));
- assertFalse(tracker.hasPending());
- assertEquals(now.plus(standardSeconds(7)), tracker.getWatermark());
-
- GrowthTracker<String, String, Integer> residualTracker = newTracker(tracker.checkpoint());
- GrowthTracker<String, String, Integer> primaryTracker =
- newTracker(tracker.currentRestriction());
+ PollingGrowthState<Integer> residual = (PollingGrowthState<Integer>) tracker.checkpoint();
+ GrowthState primary = tracker.currentRestriction();
+ tracker.checkDone();
// Verify primary: should contain what the current tracker claimed, and nothing else.
- assertEquals(now.plus(standardSeconds(1)), primaryTracker.getWatermark());
- assertEquals("a", tryClaimNextPending(primaryTracker));
- assertEquals("b", tryClaimNextPending(primaryTracker));
- assertEquals("c", tryClaimNextPending(primaryTracker));
- assertEquals("d", tryClaimNextPending(primaryTracker));
- assertFalse(primaryTracker.hasPending());
- assertFalse(primaryTracker.shouldPollMore());
- // No more pending elements in primary restriction, and no polling.
- primaryTracker.checkDone();
- assertEquals(BoundedWindow.TIMESTAMP_MAX_VALUE, primaryTracker.getWatermark());
+ assertEquals(GrowthTracker.EMPTY_STATE, primary);
// Verify residual: should contain what the current tracker didn't claim.
- assertFalse(residualTracker.hasPending());
- assertTrue(residualTracker.shouldPollMore());
- // No more pending elements in residual restriction, but poll watermark still holds.
- assertEquals(now.plus(standardSeconds(7)), residualTracker.getWatermark());
-
- // Verify current tracker: it was checkpointed, so should contain nothing else.
- tracker.checkDone();
- assertFalse(tracker.hasPending());
- assertFalse(tracker.shouldPollMore());
- assertEquals(BoundedWindow.TIMESTAMP_MAX_VALUE, tracker.getWatermark());
+ assertNull(residual.getPollWatermark());
+ assertEquals(0, residual.getCompleted().size());
+ assertEquals(0, (int) residual.getTerminationState());
}
@Test
- public void testGrowthTrackerPollAfterCheckpointIncompleteWithNewOutputs() {
+ public void testPollingGrowthTrackerHashAlreadyClaimed() {
Instant now = Instant.now();
- GrowthTracker<String, String, Integer> tracker = newTracker();
- tracker.addNewAsPending(
+ GrowthTracker<String, Integer> tracker = newPollingGrowthTracker();
+
+ PollResult<String> claim =
PollResult.incomplete(
Arrays.asList(
TimestampedValue.of("d", now.plus(standardSeconds(4))),
TimestampedValue.of("c", now.plus(standardSeconds(3))),
TimestampedValue.of("a", now.plus(standardSeconds(1))),
TimestampedValue.of("b", now.plus(standardSeconds(2)))))
- .withWatermark(now.plus(standardSeconds(7))));
+ .withWatermark(now.plus(standardSeconds(7)));
- assertEquals("a", tryClaimNextPending(tracker));
- assertEquals("b", tryClaimNextPending(tracker));
- assertEquals("c", tryClaimNextPending(tracker));
- assertEquals("d", tryClaimNextPending(tracker));
+ assertTrue(tracker.tryClaim(KV.of(claim, 1 /* termination state */)));
- GrowthState<String, String, Integer> checkpoint = tracker.checkpoint();
- // Simulate resuming from the checkpoint and adding more elements.
- {
- GrowthTracker<String, String, Integer> residualTracker = newTracker(checkpoint);
- residualTracker.addNewAsPending(
- PollResult.incomplete(
- Arrays.asList(
- TimestampedValue.of("e", now.plus(standardSeconds(5))),
- TimestampedValue.of("d", now.plus(standardSeconds(4))),
- TimestampedValue.of("c", now.plus(standardSeconds(3))),
- TimestampedValue.of("a", now.plus(standardSeconds(1))),
- TimestampedValue.of("b", now.plus(standardSeconds(2))),
- TimestampedValue.of("f", now.plus(standardSeconds(8)))))
- .withWatermark(now.plus(standardSeconds(12))));
+ PollingGrowthState<Integer> residual = (PollingGrowthState<Integer>) tracker.checkpoint();
- assertEquals(now.plus(standardSeconds(5)), residualTracker.getWatermark());
- assertEquals("e", tryClaimNextPending(residualTracker));
- assertEquals(now.plus(standardSeconds(8)), residualTracker.getWatermark());
- assertEquals("f", tryClaimNextPending(residualTracker));
-
- assertFalse(residualTracker.hasPending());
- assertTrue(residualTracker.shouldPollMore());
- assertEquals(now.plus(standardSeconds(12)), residualTracker.getWatermark());
- }
- // Try same without an explicitly specified watermark.
- {
- GrowthTracker<String, String, Integer> residualTracker = newTracker(checkpoint);
- residualTracker.addNewAsPending(
- PollResult.incomplete(
- Arrays.asList(
- TimestampedValue.of("e", now.plus(standardSeconds(5))),
- TimestampedValue.of("d", now.plus(standardSeconds(4))),
- TimestampedValue.of("c", now.plus(standardSeconds(3))),
- TimestampedValue.of("a", now.plus(standardSeconds(1))),
- TimestampedValue.of("b", now.plus(standardSeconds(2))),
- TimestampedValue.of("f", now.plus(standardSeconds(8))))));
-
- assertEquals(now.plus(standardSeconds(5)), residualTracker.getWatermark());
- assertEquals("e", tryClaimNextPending(residualTracker));
- assertEquals(now.plus(standardSeconds(5)), residualTracker.getWatermark());
- assertEquals("f", tryClaimNextPending(residualTracker));
-
- assertFalse(residualTracker.hasPending());
- assertTrue(residualTracker.shouldPollMore());
- assertEquals(now.plus(standardSeconds(5)), residualTracker.getWatermark());
- }
+ assertFalse(newTracker(residual).tryClaim(KV.of(claim, 2)));
}
@Test
- public void testGrowthTrackerPollAfterCheckpointWithoutNewOutputs() {
+ public void testNonPollingGrowthTrackerCheckpointNonEmpty() {
Instant now = Instant.now();
- GrowthTracker<String, String, Integer> tracker = newTracker();
- tracker.addNewAsPending(
+ PollResult<String> claim =
PollResult.incomplete(
Arrays.asList(
TimestampedValue.of("d", now.plus(standardSeconds(4))),
TimestampedValue.of("c", now.plus(standardSeconds(3))),
TimestampedValue.of("a", now.plus(standardSeconds(1))),
TimestampedValue.of("b", now.plus(standardSeconds(2)))))
- .withWatermark(now.plus(standardSeconds(7))));
+ .withWatermark(now.plus(standardSeconds(7)));
- assertEquals("a", tryClaimNextPending(tracker));
- assertEquals("b", tryClaimNextPending(tracker));
- assertEquals("c", tryClaimNextPending(tracker));
- assertEquals("d", tryClaimNextPending(tracker));
+ GrowthTracker<String, Integer> tracker = newTracker(NonPollingGrowthState.of(claim));
- // Simulate resuming from the checkpoint but there are no new elements.
- GrowthState<String, String, Integer> checkpoint = tracker.checkpoint();
- {
- GrowthTracker<String, String, Integer> residualTracker = newTracker(checkpoint);
- residualTracker.addNewAsPending(
- PollResult.incomplete(
- Arrays.asList(
- TimestampedValue.of("c", now.plus(standardSeconds(3))),
- TimestampedValue.of("d", now.plus(standardSeconds(4))),
- TimestampedValue.of("a", now.plus(standardSeconds(1))),
- TimestampedValue.of("b", now.plus(standardSeconds(2)))))
- .withWatermark(now.plus(standardSeconds(12))));
-
- assertFalse(residualTracker.hasPending());
- assertTrue(residualTracker.shouldPollMore());
- assertEquals(now.plus(standardSeconds(12)), residualTracker.getWatermark());
- }
- // Try the same without an explicitly specified watermark
- {
- GrowthTracker<String, String, Integer> residualTracker = newTracker(checkpoint);
- residualTracker.addNewAsPending(
- PollResult.incomplete(
- Arrays.asList(
- TimestampedValue.of("c", now.plus(standardSeconds(3))),
- TimestampedValue.of("d", now.plus(standardSeconds(4))),
- TimestampedValue.of("a", now.plus(standardSeconds(1))),
- TimestampedValue.of("b", now.plus(standardSeconds(2))))));
- // No new elements and no explicit watermark supplied - should reuse old watermark.
- assertEquals(now.plus(standardSeconds(7)), residualTracker.getWatermark());
- }
- }
-
- @Test
- public void testGrowthTrackerPollAfterCheckpointWithoutNewOutputsNoWatermark() {
- Instant now = Instant.now();
- GrowthTracker<String, String, Integer> tracker = newTracker();
- tracker.addNewAsPending(
- PollResult.incomplete(
- Arrays.asList(
- TimestampedValue.of("d", now.plus(standardSeconds(4))),
- TimestampedValue.of("c", now.plus(standardSeconds(3))),
- TimestampedValue.of("a", now.plus(standardSeconds(1))),
- TimestampedValue.of("b", now.plus(standardSeconds(2))))));
- assertEquals("a", tryClaimNextPending(tracker));
- assertEquals("b", tryClaimNextPending(tracker));
- assertEquals("c", tryClaimNextPending(tracker));
- assertEquals("d", tryClaimNextPending(tracker));
- assertEquals(now.plus(standardSeconds(1)), tracker.getWatermark());
-
- // Simulate resuming from the checkpoint but there are no new elements.
- GrowthState<String, String, Integer> checkpoint = tracker.checkpoint();
- GrowthTracker<String, String, Integer> residualTracker = newTracker(checkpoint);
- residualTracker.addNewAsPending(
- PollResult.incomplete(
- Arrays.asList(
- TimestampedValue.of("c", now.plus(standardSeconds(3))),
- TimestampedValue.of("d", now.plus(standardSeconds(4))),
- TimestampedValue.of("a", now.plus(standardSeconds(1))),
- TimestampedValue.of("b", now.plus(standardSeconds(2))))));
- // No new elements and no explicit watermark supplied - should keep old watermark.
- assertEquals(now.plus(standardSeconds(1)), residualTracker.getWatermark());
- }
-
- @Test
- public void testGrowthTrackerRepeatedEmptyPollWatermark() {
- // Empty poll result with no watermark
- {
- GrowthTracker<String, String, Integer> tracker = newTracker();
- tracker.addNewAsPending(PollResult.incomplete(Collections.emptyList()));
- assertEquals(BoundedWindow.TIMESTAMP_MIN_VALUE, tracker.getWatermark());
- }
- // Empty poll result with watermark
- {
- Instant now = Instant.now();
- GrowthTracker<String, String, Integer> tracker = newTracker();
- tracker.addNewAsPending(
- PollResult.incomplete(Collections.<TimestampedValue<String>>emptyList())
- .withWatermark(now));
- assertEquals(now, tracker.getWatermark());
- }
- }
-
- @Test
- public void testGrowthTrackerOutputFullyBeforeCheckpointComplete() {
- Instant now = Instant.now();
- GrowthTracker<String, String, Integer> tracker = newTracker();
- tracker.addNewAsPending(
- PollResult.complete(
- Arrays.asList(
- TimestampedValue.of("d", now.plus(standardSeconds(4))),
- TimestampedValue.of("c", now.plus(standardSeconds(3))),
- TimestampedValue.of("a", now.plus(standardSeconds(1))),
- TimestampedValue.of("b", now.plus(standardSeconds(2))))));
-
- assertEquals("a", tryClaimNextPending(tracker));
- assertEquals("b", tryClaimNextPending(tracker));
- assertEquals("c", tryClaimNextPending(tracker));
- assertEquals("d", tryClaimNextPending(tracker));
- assertFalse(tracker.hasPending());
- assertEquals(BoundedWindow.TIMESTAMP_MAX_VALUE, tracker.getWatermark());
-
- GrowthTracker<String, String, Integer> residualTracker = newTracker(tracker.checkpoint());
-
- // Verify residual: should be empty, since output was final.
- residualTracker.checkDone();
- assertFalse(residualTracker.hasPending());
- assertFalse(residualTracker.shouldPollMore());
- // No more pending elements in residual restriction, but poll watermark still holds.
- assertEquals(BoundedWindow.TIMESTAMP_MAX_VALUE, residualTracker.getWatermark());
-
- // Verify current tracker: it was checkpointed, so should contain nothing else.
+ assertTrue(tracker.tryClaim(KV.of(claim, 1 /* termination state */)));
+ GrowthState residual = tracker.checkpoint();
+ NonPollingGrowthState<String> primary =
+ (NonPollingGrowthState<String>) tracker.currentRestriction();
tracker.checkDone();
- assertFalse(tracker.hasPending());
- assertFalse(tracker.shouldPollMore());
- assertEquals(BoundedWindow.TIMESTAMP_MAX_VALUE, tracker.getWatermark());
+
+ // Verify primary: should contain what the current tracker claimed, and nothing else.
+ assertEquals(claim, primary.getPending());
+
+ // Verify residual: should contain what the current tracker didn't claim.
+ assertEquals(GrowthTracker.EMPTY_STATE, residual);
+ }
+
+ @Test
+ public void testNonPollingGrowthTrackerCheckpointEmpty() {
+ Instant now = Instant.now();
+ PollResult<String> claim =
+ PollResult.incomplete(
+ Arrays.asList(
+ TimestampedValue.of("d", now.plus(standardSeconds(4))),
+ TimestampedValue.of("c", now.plus(standardSeconds(3))),
+ TimestampedValue.of("a", now.plus(standardSeconds(1))),
+ TimestampedValue.of("b", now.plus(standardSeconds(2)))))
+ .withWatermark(now.plus(standardSeconds(7)));
+
+ GrowthTracker<String, Integer> tracker = newTracker(NonPollingGrowthState.of(claim));
+
+ NonPollingGrowthState<String> residual = (NonPollingGrowthState<String>) tracker.checkpoint();
+ GrowthState primary = tracker.currentRestriction();
+ tracker.checkDone();
+
+ // Verify primary: should contain what the current tracker claimed, and nothing else.
+ assertEquals(GrowthTracker.EMPTY_STATE, primary);
+
+ // Verify residual: should contain what the current tracker didn't claim.
+ assertEquals(claim, residual.getPending());
+ }
+
+ @Test
+ public void testNonPollingGrowthTrackerFailedToClaimOtherPollResult() {
+ Instant now = Instant.now();
+ PollResult<String> claim =
+ PollResult.incomplete(
+ Arrays.asList(
+ TimestampedValue.of("d", now.plus(standardSeconds(4))),
+ TimestampedValue.of("c", now.plus(standardSeconds(3))),
+ TimestampedValue.of("a", now.plus(standardSeconds(1))),
+ TimestampedValue.of("b", now.plus(standardSeconds(2)))))
+ .withWatermark(now.plus(standardSeconds(7)));
+
+ GrowthTracker<String, Integer> tracker = newTracker(NonPollingGrowthState.of(claim));
+
+ PollResult<String> otherClaim =
+ PollResult.incomplete(
+ Arrays.asList(
+ TimestampedValue.of("x", now.plus(standardSeconds(14))),
+ TimestampedValue.of("y", now.plus(standardSeconds(13))),
+ TimestampedValue.of("z", now.plus(standardSeconds(12)))))
+ .withWatermark(now.plus(standardSeconds(17)));
+ assertFalse(tracker.tryClaim(KV.of(otherClaim, 1)));
}
}
diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/DoFnInvokersTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/DoFnInvokersTest.java
index 54cae0d..0d2be5a 100644
--- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/DoFnInvokersTest.java
+++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/DoFnInvokersTest.java
@@ -317,8 +317,8 @@
public void testDoFnWithReturn() throws Exception {
class MockFn extends DoFn<String, String> {
@DoFn.ProcessElement
- public ProcessContinuation processElement(ProcessContext c, SomeRestrictionTracker tracker)
- throws Exception {
+ public ProcessContinuation processElement(
+ ProcessContext c, RestrictionTracker<SomeRestriction, Void> tracker) throws Exception {
return null;
}
@@ -394,7 +394,8 @@
/** Public so Mockito can do "delegatesTo()" in the test below. */
public static class MockFn extends DoFn<String, String> {
@ProcessElement
- public ProcessContinuation processElement(ProcessContext c, SomeRestrictionTracker tracker) {
+ public ProcessContinuation processElement(
+ ProcessContext c, RestrictionTracker<SomeRestriction, Void> tracker) {
return null;
}
@@ -495,7 +496,7 @@
private static class DefaultTracker
extends RestrictionTracker<RestrictionWithDefaultTracker, Void> {
@Override
- protected boolean tryClaimImpl(Void position) {
+ public boolean tryClaim(Void position) {
throw new UnsupportedOperationException();
}
@@ -531,7 +532,8 @@
public void testSplittableDoFnDefaultMethods() throws Exception {
class MockFn extends DoFn<String, String> {
@ProcessElement
- public void processElement(ProcessContext c, DefaultTracker tracker) {}
+ public void processElement(
+ ProcessContext c, RestrictionTracker<RestrictionWithDefaultTracker, Void> tracker) {}
@GetInitialRestriction
public RestrictionWithDefaultTracker getInitialRestriction(String element) {
@@ -740,7 +742,8 @@
new DoFn<Integer, Integer>() {
@ProcessElement
public ProcessContinuation processElement(
- @SuppressWarnings("unused") ProcessContext c, SomeRestrictionTracker tracker) {
+ @SuppressWarnings("unused") ProcessContext c,
+ RestrictionTracker<SomeRestriction, Void> tracker) {
throw new IllegalArgumentException("bogus");
}
diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/DoFnSignaturesSplittableDoFnTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/DoFnSignaturesSplittableDoFnTest.java
index a9047d0..da673da 100644
--- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/DoFnSignaturesSplittableDoFnTest.java
+++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/DoFnSignaturesSplittableDoFnTest.java
@@ -111,7 +111,8 @@
public void testInfersBoundednessFromAnnotation() throws Exception {
class BaseSplittableFn extends DoFn<Integer, String> {
@ProcessElement
- public void processElement(ProcessContext context, SomeRestrictionTracker tracker) {}
+ public void processElement(
+ ProcessContext context, RestrictionTracker<SomeRestriction, Void> tracker) {}
@GetInitialRestriction
public SomeRestriction getInitialRestriction(Integer element) {
@@ -138,7 +139,8 @@
private static class BaseFnWithoutContinuation extends DoFn<Integer, String> {
@ProcessElement
- public void processElement(ProcessContext context, SomeRestrictionTracker tracker) {}
+ public void processElement(
+ ProcessContext context, RestrictionTracker<SomeRestriction, Void> tracker) {}
@GetInitialRestriction
public SomeRestriction getInitialRestriction(Integer element) {
@@ -149,7 +151,7 @@
private static class BaseFnWithContinuation extends DoFn<Integer, String> {
@ProcessElement
public ProcessContinuation processElement(
- ProcessContext context, SomeRestrictionTracker tracker) {
+ ProcessContext context, RestrictionTracker<SomeRestriction, Void> tracker) {
return null;
}
@@ -228,7 +230,7 @@
class GoodSplittableDoFn extends DoFn<Integer, String> {
@ProcessElement
public ProcessContinuation processElement(
- ProcessContext context, SomeRestrictionTracker tracker) {
+ ProcessContext context, RestrictionTracker<SomeRestriction, Void> tracker) {
return null;
}
@@ -253,7 +255,7 @@
}
DoFnSignature signature = DoFnSignatures.getSignature(GoodSplittableDoFn.class);
- assertEquals(SomeRestrictionTracker.class, signature.processElement().trackerT().getRawType());
+ assertEquals(RestrictionTracker.class, signature.processElement().trackerT().getRawType());
assertTrue(signature.processElement().isSplittable());
assertTrue(signature.processElement().hasReturnValue());
assertEquals(
@@ -299,14 +301,16 @@
DoFnSignature signature =
DoFnSignatures.getSignature(
new GoodGenericSplittableDoFn<
- SomeRestriction, SomeRestrictionTracker, SomeRestrictionCoder>() {}.getClass());
- assertEquals(SomeRestrictionTracker.class, signature.processElement().trackerT().getRawType());
+ SomeRestriction,
+ RestrictionTracker<SomeRestriction, ?>,
+ SomeRestrictionCoder>() {}.getClass());
+ assertEquals(RestrictionTracker.class, signature.processElement().trackerT().getRawType());
assertTrue(signature.processElement().isSplittable());
assertTrue(signature.processElement().hasReturnValue());
assertEquals(
SomeRestriction.class, signature.getInitialRestriction().restrictionT().getRawType());
assertEquals(SomeRestriction.class, signature.splitRestriction().restrictionT().getRawType());
- assertEquals(SomeRestrictionTracker.class, signature.newTracker().trackerT().getRawType());
+ assertEquals(RestrictionTracker.class, signature.newTracker().trackerT().getRawType());
assertEquals(SomeRestriction.class, signature.newTracker().restrictionT().getRawType());
assertEquals(SomeRestrictionCoder.class, signature.getRestrictionCoder().coderT().getRawType());
}
@@ -315,7 +319,8 @@
public void testSplittableMissingRequiredMethods() throws Exception {
class BadFn extends DoFn<Integer, String> {
@ProcessElement
- public void process(ProcessContext context, SomeRestrictionTracker tracker) {}
+ public void process(
+ ProcessContext context, RestrictionTracker<SomeRestriction, Void> tracker) {}
}
thrown.expectMessage(
@@ -334,7 +339,8 @@
public void testHasDefaultTracker() throws Exception {
class Fn extends DoFn<Integer, String> {
@ProcessElement
- public void process(ProcessContext c, SomeDefaultTracker tracker) {}
+ public void process(
+ ProcessContext c, RestrictionTracker<RestrictionWithDefaultTracker, Void> tracker) {}
@GetInitialRestriction
public RestrictionWithDefaultTracker getInitialRestriction(Integer element) {
@@ -343,7 +349,7 @@
}
DoFnSignature signature = DoFnSignatures.getSignature(Fn.class);
- assertEquals(SomeDefaultTracker.class, signature.processElement().trackerT().getRawType());
+ assertEquals(RestrictionTracker.class, signature.processElement().trackerT().getRawType());
}
@Test
@@ -359,11 +365,8 @@
}
thrown.expectMessage(
- "Has tracker type SomeRestrictionTracker, but the DoFn's tracker type was inferred as ");
- thrown.expectMessage("SomeDefaultTracker");
- thrown.expectMessage(
- "from restriction type RestrictionWithDefaultTracker "
- + "of @GetInitialRestriction method getInitialRestriction(Integer)");
+ "Has tracker type SomeRestrictionTracker, "
+ + "but the DoFn's tracker type must be of type RestrictionTracker.");
DoFnSignatures.getSignature(Fn.class);
}
@@ -371,7 +374,8 @@
public void testNewTrackerReturnsWrongType() throws Exception {
class BadFn extends DoFn<Integer, String> {
@ProcessElement
- public void process(ProcessContext context, SomeRestrictionTracker tracker) {}
+ public void process(
+ ProcessContext context, RestrictionTracker<SomeRestriction, Void> tracker) {}
@NewTracker
public void newTracker(SomeRestriction restriction) {}
@@ -391,7 +395,8 @@
public void testGetInitialRestrictionMismatchesNewTracker() throws Exception {
class BadFn extends DoFn<Integer, String> {
@ProcessElement
- public void process(ProcessContext context, SomeRestrictionTracker tracker) {}
+ public void process(
+ ProcessContext context, RestrictionTracker<SomeRestriction, Void> tracker) {}
@NewTracker
public SomeRestrictionTracker newTracker(SomeRestriction restriction) {
@@ -414,7 +419,8 @@
public void testGetRestrictionCoderReturnsWrongType() throws Exception {
class BadFn extends DoFn<Integer, String> {
@ProcessElement
- public void process(ProcessContext context, SomeRestrictionTracker tracker) {}
+ public void process(
+ ProcessContext context, RestrictionTracker<SomeRestriction, Void> tracker) {}
@NewTracker
public SomeRestrictionTracker newTracker(SomeRestriction restriction) {
@@ -495,7 +501,8 @@
class BadFn extends DoFn<Integer, String> {
@ProcessElement
- public void process(ProcessContext context, SomeRestrictionTracker tracker) {}
+ public void process(
+ ProcessContext context, RestrictionTracker<SomeRestriction, Void> tracker) {}
@NewTracker
public SomeRestrictionTracker newTracker(SomeRestriction restriction) {
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java
index c41e627..61c28c48 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java
@@ -48,17 +48,17 @@
import org.apache.beam.sdk.extensions.gcp.auth.GcpCredentialFactory;
import org.apache.beam.sdk.extensions.gcp.auth.NullCredentialInitializer;
import org.apache.beam.sdk.extensions.gcp.storage.PathValidator;
+import org.apache.beam.sdk.extensions.gcp.util.BackOffAdapter;
+import org.apache.beam.sdk.extensions.gcp.util.RetryHttpRequestInitializer;
+import org.apache.beam.sdk.extensions.gcp.util.Transport;
+import org.apache.beam.sdk.extensions.gcp.util.gcsfs.GcsPath;
import org.apache.beam.sdk.options.Default;
import org.apache.beam.sdk.options.DefaultValueFactory;
import org.apache.beam.sdk.options.Description;
import org.apache.beam.sdk.options.ExperimentalOptions;
import org.apache.beam.sdk.options.PipelineOptions;
-import org.apache.beam.sdk.util.BackOffAdapter;
import org.apache.beam.sdk.util.FluentBackoff;
import org.apache.beam.sdk.util.InstanceBuilder;
-import org.apache.beam.sdk.util.RetryHttpRequestInitializer;
-import org.apache.beam.sdk.util.Transport;
-import org.apache.beam.sdk.util.gcsfs.GcsPath;
import org.apache.beam.vendor.guava.v20_0.com.google.common.annotations.VisibleForTesting;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableList;
import org.apache.beam.vendor.guava.v20_0.com.google.common.io.Files;
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcsOptions.java b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcsOptions.java
index 88e81ac..dc3d1f4 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcsOptions.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcsOptions.java
@@ -28,13 +28,13 @@
import org.apache.beam.sdk.annotations.Experimental.Kind;
import org.apache.beam.sdk.extensions.gcp.storage.GcsPathValidator;
import org.apache.beam.sdk.extensions.gcp.storage.PathValidator;
+import org.apache.beam.sdk.extensions.gcp.util.GcsUtil;
import org.apache.beam.sdk.options.ApplicationNameOptions;
import org.apache.beam.sdk.options.Default;
import org.apache.beam.sdk.options.DefaultValueFactory;
import org.apache.beam.sdk.options.Description;
import org.apache.beam.sdk.options.Hidden;
import org.apache.beam.sdk.options.PipelineOptions;
-import org.apache.beam.sdk.util.GcsUtil;
import org.apache.beam.sdk.util.InstanceBuilder;
import org.apache.beam.vendor.guava.v20_0.com.google.common.util.concurrent.MoreExecutors;
import org.apache.beam.vendor.guava.v20_0.com.google.common.util.concurrent.ThreadFactoryBuilder;
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystem.java b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystem.java
index 84004bc..3fd507c 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystem.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystem.java
@@ -38,6 +38,9 @@
import java.util.regex.Pattern;
import javax.annotation.Nullable;
import org.apache.beam.sdk.extensions.gcp.options.GcsOptions;
+import org.apache.beam.sdk.extensions.gcp.util.GcsUtil;
+import org.apache.beam.sdk.extensions.gcp.util.GcsUtil.StorageObjectOrIOException;
+import org.apache.beam.sdk.extensions.gcp.util.gcsfs.GcsPath;
import org.apache.beam.sdk.io.FileSystem;
import org.apache.beam.sdk.io.fs.CreateOptions;
import org.apache.beam.sdk.io.fs.MatchResult;
@@ -45,9 +48,6 @@
import org.apache.beam.sdk.io.fs.MatchResult.Status;
import org.apache.beam.sdk.metrics.Counter;
import org.apache.beam.sdk.metrics.Metrics;
-import org.apache.beam.sdk.util.GcsUtil;
-import org.apache.beam.sdk.util.GcsUtil.StorageObjectOrIOException;
-import org.apache.beam.sdk.util.gcsfs.GcsPath;
import org.apache.beam.vendor.guava.v20_0.com.google.common.annotations.VisibleForTesting;
import org.apache.beam.vendor.guava.v20_0.com.google.common.base.Stopwatch;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.FluentIterable;
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsPathValidator.java b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsPathValidator.java
index 25a9656..e109634 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsPathValidator.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsPathValidator.java
@@ -21,9 +21,9 @@
import java.io.IOException;
import org.apache.beam.sdk.extensions.gcp.options.GcsOptions;
+import org.apache.beam.sdk.extensions.gcp.util.gcsfs.GcsPath;
import org.apache.beam.sdk.io.fs.ResourceId;
import org.apache.beam.sdk.options.PipelineOptions;
-import org.apache.beam.sdk.util.gcsfs.GcsPath;
/** GCP implementation of {@link PathValidator}. Only GCS paths are allowed. */
public class GcsPathValidator implements PathValidator {
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsResourceId.java b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsResourceId.java
index 41abc54..397736e 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsResourceId.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsResourceId.java
@@ -22,10 +22,10 @@
import static org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkState;
import javax.annotation.Nullable;
+import org.apache.beam.sdk.extensions.gcp.util.gcsfs.GcsPath;
import org.apache.beam.sdk.io.fs.ResolveOptions;
import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions;
import org.apache.beam.sdk.io.fs.ResourceId;
-import org.apache.beam.sdk.util.gcsfs.GcsPath;
/** {@link ResourceId} implementation for Google Cloud Storage. */
public class GcsResourceId implements ResourceId {
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/BackOffAdapter.java b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/BackOffAdapter.java
similarity index 94%
rename from sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/BackOffAdapter.java
rename to sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/BackOffAdapter.java
index 4dd799d..001ef1c 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/BackOffAdapter.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/BackOffAdapter.java
@@ -15,9 +15,10 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.sdk.util;
+package org.apache.beam.sdk.extensions.gcp.util;
import java.io.IOException;
+import org.apache.beam.sdk.util.BackOff;
/**
* An adapter for converting between Apache Beam and Google API client representations of backoffs.
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/CustomHttpErrors.java b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/CustomHttpErrors.java
similarity index 98%
rename from sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/CustomHttpErrors.java
rename to sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/CustomHttpErrors.java
index db46d98..89089df 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/CustomHttpErrors.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/CustomHttpErrors.java
@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.sdk.util;
+package org.apache.beam.sdk.extensions.gcp.util;
import com.google.auto.value.AutoValue;
import java.io.Serializable;
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java
similarity index 99%
rename from sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java
rename to sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java
index f06bed8..0f066b8 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java
@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.sdk.util;
+package org.apache.beam.sdk.extensions.gcp.util;
import static org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument;
import static org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkNotNull;
@@ -65,9 +65,11 @@
import java.util.regex.Pattern;
import javax.annotation.Nullable;
import org.apache.beam.sdk.extensions.gcp.options.GcsOptions;
+import org.apache.beam.sdk.extensions.gcp.util.gcsfs.GcsPath;
import org.apache.beam.sdk.options.DefaultValueFactory;
import org.apache.beam.sdk.options.PipelineOptions;
-import org.apache.beam.sdk.util.gcsfs.GcsPath;
+import org.apache.beam.sdk.util.FluentBackoff;
+import org.apache.beam.sdk.util.MoreFutures;
import org.apache.beam.vendor.guava.v20_0.com.google.common.annotations.VisibleForTesting;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableList;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Lists;
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/HttpCallCustomError.java b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/HttpCallCustomError.java
similarity index 95%
rename from sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/HttpCallCustomError.java
rename to sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/HttpCallCustomError.java
index cb95c82..96d01c4 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/HttpCallCustomError.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/HttpCallCustomError.java
@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.sdk.util;
+package org.apache.beam.sdk.extensions.gcp.util;
/** Lambda interface for defining a custom error to log based on an http request and response. */
interface HttpCallCustomError {
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/HttpCallMatcher.java b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/HttpCallMatcher.java
similarity index 95%
rename from sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/HttpCallMatcher.java
rename to sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/HttpCallMatcher.java
index 2437d45..fb44c7c 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/HttpCallMatcher.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/HttpCallMatcher.java
@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.sdk.util;
+package org.apache.beam.sdk.extensions.gcp.util;
/**
* Lambda interface for inspecting an http request and response to match the failure and possibly
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/HttpRequestWrapper.java b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/HttpRequestWrapper.java
similarity index 96%
rename from sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/HttpRequestWrapper.java
rename to sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/HttpRequestWrapper.java
index 068a594..ea276db 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/HttpRequestWrapper.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/HttpRequestWrapper.java
@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.sdk.util;
+package org.apache.beam.sdk.extensions.gcp.util;
import com.google.api.client.http.GenericUrl;
import com.google.api.client.http.HttpRequest;
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/HttpResponseWrapper.java b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/HttpResponseWrapper.java
similarity index 96%
rename from sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/HttpResponseWrapper.java
rename to sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/HttpResponseWrapper.java
index 3fdf780..2e4b564 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/HttpResponseWrapper.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/HttpResponseWrapper.java
@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.sdk.util;
+package org.apache.beam.sdk.extensions.gcp.util;
import com.google.api.client.http.HttpResponse;
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/RetryHttpRequestInitializer.java b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/RetryHttpRequestInitializer.java
similarity index 99%
rename from sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/RetryHttpRequestInitializer.java
rename to sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/RetryHttpRequestInitializer.java
index 2df2e60..1f6b8d1 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/RetryHttpRequestInitializer.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/RetryHttpRequestInitializer.java
@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.sdk.util;
+package org.apache.beam.sdk.extensions.gcp.util;
import static com.google.api.client.util.BackOffUtils.next;
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/Transport.java b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/Transport.java
similarity index 98%
rename from sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/Transport.java
rename to sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/Transport.java
index 0e1b6bd..5662c45 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/Transport.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/Transport.java
@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.sdk.util;
+package org.apache.beam.sdk.extensions.gcp.util;
import com.google.api.client.googleapis.javanet.GoogleNetHttpTransport;
import com.google.api.client.http.HttpRequestInitializer;
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/UploadIdResponseInterceptor.java b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/UploadIdResponseInterceptor.java
similarity index 97%
rename from sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/UploadIdResponseInterceptor.java
rename to sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/UploadIdResponseInterceptor.java
index 90f27a1..7baad4a 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/UploadIdResponseInterceptor.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/UploadIdResponseInterceptor.java
@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.sdk.util;
+package org.apache.beam.sdk.extensions.gcp.util;
import com.google.api.client.http.GenericUrl;
import com.google.api.client.http.HttpResponse;
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/gcsfs/GcsPath.java b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/gcsfs/GcsPath.java
similarity index 99%
rename from sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/gcsfs/GcsPath.java
rename to sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/gcsfs/GcsPath.java
index 09efb2e..a0a4a52 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/gcsfs/GcsPath.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/gcsfs/GcsPath.java
@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.sdk.util.gcsfs;
+package org.apache.beam.sdk.extensions.gcp.util.gcsfs;
import static org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument;
import static org.apache.beam.vendor.guava.v20_0.com.google.common.base.Strings.isNullOrEmpty;
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/gcsfs/package-info.java b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/gcsfs/package-info.java
similarity index 93%
rename from sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/gcsfs/package-info.java
rename to sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/gcsfs/package-info.java
index 4d49f8c..91e8e6e 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/gcsfs/package-info.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/gcsfs/package-info.java
@@ -17,4 +17,4 @@
*/
/** Defines utilities used to interact with Google Cloud Storage. */
-package org.apache.beam.sdk.util.gcsfs;
+package org.apache.beam.sdk.extensions.gcp.util.gcsfs;
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/package-info.java b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/package-info.java
similarity index 94%
rename from sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/package-info.java
rename to sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/package-info.java
index f8135e7..1367772 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/package-info.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/package-info.java
@@ -17,4 +17,4 @@
*/
/** Defines Google Cloud Platform component utilities that can be used by Beam runners. */
-package org.apache.beam.sdk.util;
+package org.apache.beam.sdk.extensions.gcp.util;
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/GcpCoreApiSurfaceTest.java b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/GcpCoreApiSurfaceTest.java
index 37cd2ea..5bee366 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/GcpCoreApiSurfaceTest.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/GcpCoreApiSurfaceTest.java
@@ -57,7 +57,8 @@
classesInPackage("java"),
classesInPackage("javax"),
classesInPackage("org.apache.beam.sdk"),
- classesInPackage("org.joda.time"));
+ classesInPackage("org.joda.time"),
+ classesInPackage("org.junit"));
assertThat(apiSurface, containsOnlyClassesMatching(allowedClasses));
}
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptionsTest.java b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptionsTest.java
index 5c3444b..e5a0f83 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptionsTest.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptionsTest.java
@@ -40,11 +40,11 @@
import org.apache.beam.sdk.extensions.gcp.options.GcpOptions.DefaultProjectFactory;
import org.apache.beam.sdk.extensions.gcp.options.GcpOptions.GcpTempLocationFactory;
import org.apache.beam.sdk.extensions.gcp.storage.NoopPathValidator;
+import org.apache.beam.sdk.extensions.gcp.util.GcsUtil;
+import org.apache.beam.sdk.extensions.gcp.util.gcsfs.GcsPath;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.testing.RestoreSystemProperties;
-import org.apache.beam.sdk.util.GcsUtil;
-import org.apache.beam.sdk.util.gcsfs.GcsPath;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableMap;
import org.apache.beam.vendor.guava.v20_0.com.google.common.io.Files;
import org.junit.Before;
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/options/GoogleApiDebugOptionsTest.java b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/options/GoogleApiDebugOptionsTest.java
index f4e7905..0ea471c 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/options/GoogleApiDebugOptionsTest.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/options/GoogleApiDebugOptionsTest.java
@@ -26,8 +26,8 @@
import com.google.api.services.storage.Storage;
import org.apache.beam.sdk.extensions.gcp.auth.TestCredential;
import org.apache.beam.sdk.extensions.gcp.options.GoogleApiDebugOptions.GoogleApiTracer;
+import org.apache.beam.sdk.extensions.gcp.util.Transport;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
-import org.apache.beam.sdk.util.Transport;
import org.apache.beam.sdk.util.common.ReflectHelpers;
import org.junit.Test;
import org.junit.runner.RunWith;
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystemTest.java b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystemTest.java
index e800758..7bd3f9d 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystemTest.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystemTest.java
@@ -34,12 +34,12 @@
import java.util.List;
import javax.annotation.Nullable;
import org.apache.beam.sdk.extensions.gcp.options.GcsOptions;
+import org.apache.beam.sdk.extensions.gcp.util.GcsUtil;
+import org.apache.beam.sdk.extensions.gcp.util.GcsUtil.StorageObjectOrIOException;
+import org.apache.beam.sdk.extensions.gcp.util.gcsfs.GcsPath;
import org.apache.beam.sdk.io.fs.MatchResult;
import org.apache.beam.sdk.io.fs.MatchResult.Status;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
-import org.apache.beam.sdk.util.GcsUtil;
-import org.apache.beam.sdk.util.GcsUtil.StorageObjectOrIOException;
-import org.apache.beam.sdk.util.gcsfs.GcsPath;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.FluentIterable;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableList;
import org.junit.Before;
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/storage/GcsPathValidatorTest.java b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/storage/GcsPathValidatorTest.java
index 34fd924..924afe7 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/storage/GcsPathValidatorTest.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/storage/GcsPathValidatorTest.java
@@ -22,9 +22,9 @@
import org.apache.beam.sdk.extensions.gcp.auth.TestCredential;
import org.apache.beam.sdk.extensions.gcp.options.GcsOptions;
+import org.apache.beam.sdk.extensions.gcp.util.GcsUtil;
+import org.apache.beam.sdk.extensions.gcp.util.gcsfs.GcsPath;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
-import org.apache.beam.sdk.util.GcsUtil;
-import org.apache.beam.sdk.util.gcsfs.GcsPath;
import org.junit.Before;
import org.junit.Rule;
import org.junit.Test;
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/storage/GcsResourceIdTest.java b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/storage/GcsResourceIdTest.java
index 9bd3fae..9df76a8 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/storage/GcsResourceIdTest.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/storage/GcsResourceIdTest.java
@@ -23,12 +23,12 @@
import static org.junit.Assert.assertNull;
import static org.junit.Assert.assertTrue;
+import org.apache.beam.sdk.extensions.gcp.util.gcsfs.GcsPath;
import org.apache.beam.sdk.io.FileSystems;
import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions;
import org.apache.beam.sdk.io.fs.ResourceId;
import org.apache.beam.sdk.io.fs.ResourceIdTester;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
-import org.apache.beam.sdk.util.gcsfs.GcsPath;
import org.junit.Rule;
import org.junit.Test;
import org.junit.rules.ExpectedException;
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/util/CustomHttpErrorsTest.java b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/CustomHttpErrorsTest.java
similarity index 98%
rename from sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/util/CustomHttpErrorsTest.java
rename to sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/CustomHttpErrorsTest.java
index ddb5a85..c33933b 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/util/CustomHttpErrorsTest.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/CustomHttpErrorsTest.java
@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.sdk.util;
+package org.apache.beam.sdk.extensions.gcp.util;
import static junit.framework.TestCase.assertEquals;
import static junit.framework.TestCase.assertNull;
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/util/FastNanoClockAndSleeper.java b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/FastNanoClockAndSleeper.java
similarity index 96%
rename from sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/util/FastNanoClockAndSleeper.java
rename to sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/FastNanoClockAndSleeper.java
index 67ead37..dd6f92c 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/util/FastNanoClockAndSleeper.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/FastNanoClockAndSleeper.java
@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.sdk.util;
+package org.apache.beam.sdk.extensions.gcp.util;
import com.google.api.client.util.NanoClock;
import com.google.api.client.util.Sleeper;
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/util/FastNanoClockAndSleeperTest.java b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/FastNanoClockAndSleeperTest.java
similarity index 97%
rename from sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/util/FastNanoClockAndSleeperTest.java
rename to sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/FastNanoClockAndSleeperTest.java
index 03f9588..88a55e7 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/util/FastNanoClockAndSleeperTest.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/FastNanoClockAndSleeperTest.java
@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.sdk.util;
+package org.apache.beam.sdk.extensions.gcp.util;
import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertTrue;
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/util/GcsUtilIT.java b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtilIT.java
similarity index 96%
rename from sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/util/GcsUtilIT.java
rename to sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtilIT.java
index e428d7b..9a3ebf7 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/util/GcsUtilIT.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtilIT.java
@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.sdk.util;
+package org.apache.beam.sdk.extensions.gcp.util;
import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.Matchers.equalTo;
@@ -25,9 +25,9 @@
import java.util.Date;
import java.util.concurrent.atomic.AtomicInteger;
import org.apache.beam.sdk.extensions.gcp.options.GcsOptions;
+import org.apache.beam.sdk.extensions.gcp.util.gcsfs.GcsPath;
import org.apache.beam.sdk.testing.TestPipeline;
import org.apache.beam.sdk.testing.TestPipelineOptions;
-import org.apache.beam.sdk.util.gcsfs.GcsPath;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Lists;
import org.junit.Test;
import org.junit.runner.RunWith;
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/util/GcsUtilTest.java b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtilTest.java
similarity index 98%
rename from sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/util/GcsUtilTest.java
rename to sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtilTest.java
index 0f70861..759296e 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/util/GcsUtilTest.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtilTest.java
@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.sdk.util;
+package org.apache.beam.sdk.extensions.gcp.util;
import static org.hamcrest.Matchers.contains;
import static org.hamcrest.Matchers.equalTo;
@@ -75,10 +75,11 @@
import java.util.concurrent.TimeUnit;
import org.apache.beam.sdk.extensions.gcp.auth.TestCredential;
import org.apache.beam.sdk.extensions.gcp.options.GcsOptions;
+import org.apache.beam.sdk.extensions.gcp.util.GcsUtil.RewriteOp;
+import org.apache.beam.sdk.extensions.gcp.util.GcsUtil.StorageObjectOrIOException;
+import org.apache.beam.sdk.extensions.gcp.util.gcsfs.GcsPath;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
-import org.apache.beam.sdk.util.GcsUtil.RewriteOp;
-import org.apache.beam.sdk.util.GcsUtil.StorageObjectOrIOException;
-import org.apache.beam.sdk.util.gcsfs.GcsPath;
+import org.apache.beam.sdk.util.FluentBackoff;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableList;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Lists;
import org.junit.Rule;
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/util/RetryHttpRequestInitializerTest.java b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/RetryHttpRequestInitializerTest.java
similarity index 99%
rename from sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/util/RetryHttpRequestInitializerTest.java
rename to sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/RetryHttpRequestInitializerTest.java
index 8f15a3d..551f7bc 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/util/RetryHttpRequestInitializerTest.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/RetryHttpRequestInitializerTest.java
@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.sdk.util;
+package org.apache.beam.sdk.extensions.gcp.util;
import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertNotNull;
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/util/UploadIdResponseInterceptorTest.java b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/UploadIdResponseInterceptorTest.java
similarity index 96%
rename from sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/util/UploadIdResponseInterceptorTest.java
rename to sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/UploadIdResponseInterceptorTest.java
index bb64527..d0bab10 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/util/UploadIdResponseInterceptorTest.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/UploadIdResponseInterceptorTest.java
@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.sdk.util;
+package org.apache.beam.sdk.extensions.gcp.util;
import com.google.api.client.http.GenericUrl;
import com.google.api.client.http.HttpResponse;
@@ -30,7 +30,7 @@
import org.junit.runner.RunWith;
import org.junit.runners.JUnit4;
-/** A test for {@link UploadIdResponseInterceptor}. */
+/** A test for {@link org.apache.beam.sdk.extensions.gcp.util.UploadIdResponseInterceptor}. */
@RunWith(JUnit4.class)
public class UploadIdResponseInterceptorTest {
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/util/gcsfs/GcsPathTest.java b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/gcsfs/GcsPathTest.java
similarity index 99%
rename from sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/util/gcsfs/GcsPathTest.java
rename to sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/gcsfs/GcsPathTest.java
index 19db9e8..e7e77e3 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/util/gcsfs/GcsPathTest.java
+++ b/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/gcsfs/GcsPathTest.java
@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.sdk.util.gcsfs;
+package org.apache.beam.sdk.extensions.gcp.util.gcsfs;
import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertFalse;
diff --git a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/splittabledofn/RestrictionTrackers.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/splittabledofn/RestrictionTrackers.java
new file mode 100644
index 0000000..addeb68
--- /dev/null
+++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/splittabledofn/RestrictionTrackers.java
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.fn.splittabledofn;
+
+import javax.annotation.concurrent.ThreadSafe;
+import org.apache.beam.sdk.transforms.splittabledofn.Backlog;
+import org.apache.beam.sdk.transforms.splittabledofn.Backlogs;
+import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker;
+
+/** Support utilities for interacting with {@link RestrictionTracker RestrictionTrackers}. */
+public class RestrictionTrackers {
+
+ /** Interface allowing a runner to observe the calls to {@link RestrictionTracker#tryClaim}. */
+ public interface ClaimObserver<PositionT> {
+ /** Called when {@link RestrictionTracker#tryClaim} returns true. */
+ void onClaimed(PositionT position);
+
+ /** Called when {@link RestrictionTracker#tryClaim} returns false. */
+ void onClaimFailed(PositionT position);
+ }
+
+ /**
+ * A {@link RestrictionTracker} which forwards all calls to the delegate {@link
+ * RestrictionTracker}.
+ */
+ @ThreadSafe
+ private static class RestrictionTrackerObserver<RestrictionT, PositionT>
+ extends RestrictionTracker<RestrictionT, PositionT> {
+ protected final RestrictionTracker<RestrictionT, PositionT> delegate;
+ private final ClaimObserver<PositionT> claimObserver;
+
+ protected RestrictionTrackerObserver(
+ RestrictionTracker<RestrictionT, PositionT> delegate,
+ ClaimObserver<PositionT> claimObserver) {
+ this.delegate = delegate;
+ this.claimObserver = claimObserver;
+ }
+
+ @Override
+ public synchronized boolean tryClaim(PositionT position) {
+ if (delegate.tryClaim(position)) {
+ claimObserver.onClaimed(position);
+ return true;
+ } else {
+ claimObserver.onClaimFailed(position);
+ return false;
+ }
+ }
+
+ @Override
+ public synchronized RestrictionT currentRestriction() {
+ return delegate.currentRestriction();
+ }
+
+ @Override
+ public synchronized RestrictionT checkpoint() {
+ return delegate.checkpoint();
+ }
+
+ @Override
+ public synchronized void checkDone() throws IllegalStateException {
+ delegate.checkDone();
+ }
+ }
+
+ /**
+ * A {@link RestrictionTracker} which forwards all calls to the delegate backlog reporting {@link
+ * RestrictionTracker}.
+ */
+ @ThreadSafe
+ private static class RestrictionTrackerObserverWithBacklog<RestrictionT, PositionT>
+ extends RestrictionTrackerObserver<RestrictionT, PositionT> implements Backlogs.HasBacklog {
+
+ protected RestrictionTrackerObserverWithBacklog(
+ RestrictionTracker<RestrictionT, PositionT> delegate,
+ ClaimObserver<PositionT> claimObserver) {
+ super(delegate, claimObserver);
+ }
+
+ @Override
+ public synchronized Backlog getBacklog() {
+ return ((Backlogs.HasBacklog) delegate).getBacklog();
+ }
+ }
+
+ /**
+ * A {@link RestrictionTracker} which forwards all calls to the delegate partitioned backlog
+ * reporting {@link RestrictionTracker}.
+ */
+ @ThreadSafe
+ private static class RestrictionTrackerObserverWithPartitionedBacklog<RestrictionT, PositionT>
+ extends RestrictionTrackerObserverWithBacklog<RestrictionT, PositionT>
+ implements Backlogs.HasPartitionedBacklog {
+
+ protected RestrictionTrackerObserverWithPartitionedBacklog(
+ RestrictionTracker<RestrictionT, PositionT> delegate,
+ ClaimObserver<PositionT> claimObserver) {
+ super(delegate, claimObserver);
+ }
+
+ @Override
+ public synchronized byte[] getBacklogPartition() {
+ return ((Backlogs.HasPartitionedBacklog) delegate).getBacklogPartition();
+ }
+ }
+
+ /**
+ * Returns a thread safe {@link RestrictionTracker} which reports all claim attempts to the
+ * specified {@link ClaimObserver}.
+ */
+ public static <RestrictionT, PositionT> RestrictionTracker<RestrictionT, PositionT> observe(
+ RestrictionTracker<RestrictionT, PositionT> restrictionTracker,
+ ClaimObserver<PositionT> claimObserver) {
+ if (restrictionTracker instanceof Backlogs.HasPartitionedBacklog) {
+ return new RestrictionTrackerObserverWithPartitionedBacklog<>(
+ restrictionTracker, claimObserver);
+ } else if (restrictionTracker instanceof Backlogs.HasBacklog) {
+ return new RestrictionTrackerObserverWithBacklog<>(restrictionTracker, claimObserver);
+ } else {
+ return new RestrictionTrackerObserver<>(restrictionTracker, claimObserver);
+ }
+ }
+}
diff --git a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/gcsfs/package-info.java b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/splittabledofn/package-info.java
similarity index 69%
copy from sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/gcsfs/package-info.java
copy to sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/splittabledofn/package-info.java
index 4d49f8c..0f2cbd9 100644
--- a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/gcsfs/package-info.java
+++ b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/splittabledofn/package-info.java
@@ -16,5 +16,13 @@
* limitations under the License.
*/
-/** Defines utilities used to interact with Google Cloud Storage. */
-package org.apache.beam.sdk.util.gcsfs;
+/**
+ * Defines utilities related to executing <a
+ * href="https://s.apache.org/splittable-do-fn">splittable</a> {@link
+ * org.apache.beam.sdk.transforms.DoFn}.
+ */
+@DefaultAnnotation(NonNull.class)
+package org.apache.beam.sdk.fn.splittabledofn;
+
+import edu.umd.cs.findbugs.annotations.DefaultAnnotation;
+import edu.umd.cs.findbugs.annotations.NonNull;
diff --git a/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/splittabledofn/RestrictionTrackersTest.java b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/splittabledofn/RestrictionTrackersTest.java
new file mode 100644
index 0000000..c3bb289
--- /dev/null
+++ b/sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/splittabledofn/RestrictionTrackersTest.java
@@ -0,0 +1,156 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.fn.splittabledofn;
+
+import static org.hamcrest.Matchers.contains;
+import static org.hamcrest.Matchers.instanceOf;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertThat;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.beam.sdk.fn.splittabledofn.RestrictionTrackers.ClaimObserver;
+import org.apache.beam.sdk.transforms.splittabledofn.Backlog;
+import org.apache.beam.sdk.transforms.splittabledofn.Backlogs;
+import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for {@link RestrictionTrackers}. */
+@RunWith(JUnit4.class)
+public class RestrictionTrackersTest {
+ @Test
+ public void testObservingClaims() {
+ RestrictionTracker<String, String> observedTracker =
+ new RestrictionTracker() {
+
+ @Override
+ public boolean tryClaim(Object position) {
+ return "goodClaim".equals(position);
+ }
+
+ @Override
+ public Object currentRestriction() {
+ throw new UnsupportedOperationException();
+ }
+
+ @Override
+ public Object checkpoint() {
+ throw new UnsupportedOperationException();
+ }
+
+ @Override
+ public void checkDone() throws IllegalStateException {
+ throw new UnsupportedOperationException();
+ }
+ };
+
+ List<String> positionsObserved = new ArrayList<>();
+ ClaimObserver<String> observer =
+ new ClaimObserver<String>() {
+
+ @Override
+ public void onClaimed(String position) {
+ positionsObserved.add(position);
+ assertEquals("goodClaim", position);
+ }
+
+ @Override
+ public void onClaimFailed(String position) {
+ positionsObserved.add(position);
+ }
+ };
+
+ RestrictionTracker<String, String> observingTracker =
+ RestrictionTrackers.observe(observedTracker, observer);
+ observingTracker.tryClaim("goodClaim");
+ observingTracker.tryClaim("badClaim");
+
+ assertThat(positionsObserved, contains("goodClaim", "badClaim"));
+ }
+
+ private static class RestrictionTrackerWithBacklog extends RestrictionTracker<Object, Object>
+ implements Backlogs.HasBacklog {
+
+ @Override
+ public Backlog getBacklog() {
+ return null;
+ }
+
+ @Override
+ public boolean tryClaim(Object position) {
+ return false;
+ }
+
+ @Override
+ public Object currentRestriction() {
+ return null;
+ }
+
+ @Override
+ public Object checkpoint() {
+ return null;
+ }
+
+ @Override
+ public void checkDone() throws IllegalStateException {}
+ }
+
+ private static class RestrictionTrackerWithBacklogPartitionedBacklog
+ extends RestrictionTracker<Object, Object> implements Backlogs.HasPartitionedBacklog {
+
+ @Override
+ public Backlog getBacklog() {
+ return null;
+ }
+
+ @Override
+ public boolean tryClaim(Object position) {
+ return false;
+ }
+
+ @Override
+ public Object currentRestriction() {
+ return null;
+ }
+
+ @Override
+ public Object checkpoint() {
+ return null;
+ }
+
+ @Override
+ public void checkDone() throws IllegalStateException {}
+
+ @Override
+ public byte[] getBacklogPartition() {
+ return null;
+ }
+ }
+
+ @Test
+ public void testClaimObserversMaintainBacklogInterfaces() {
+ RestrictionTracker hasBacklog =
+ RestrictionTrackers.observe(new RestrictionTrackerWithBacklog(), null);
+ assertThat(hasBacklog, instanceOf(Backlogs.HasBacklog.class));
+ RestrictionTracker hasPartitionedBacklog =
+ RestrictionTrackers.observe(new RestrictionTrackerWithBacklogPartitionedBacklog(), null);
+ assertThat(hasPartitionedBacklog, instanceOf(Backlogs.HasPartitionedBacklog.class));
+ }
+}
diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/SplittableProcessElementsRunner.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/SplittableProcessElementsRunner.java
index 930aa28..0b13408 100644
--- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/SplittableProcessElementsRunner.java
+++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/SplittableProcessElementsRunner.java
@@ -156,8 +156,7 @@
processElementTyped(elem);
}
- private <PositionT, TrackerT extends RestrictionTracker<RestrictionT, PositionT>>
- void processElementTyped(WindowedValue<KV<InputT, RestrictionT>> elem) {
+ private <PositionT> void processElementTyped(WindowedValue<KV<InputT, RestrictionT>> elem) {
checkArgument(
elem.getWindows().size() == 1,
"SPLITTABLE_PROCESS_ELEMENTS expects its input to be in 1 window, but got %s windows",
@@ -175,9 +174,9 @@
(Coder<BoundedWindow>) context.windowCoder,
() -> elem,
() -> window);
- TrackerT tracker = doFnInvoker.invokeNewTracker(elem.getValue().getValue());
- OutputAndTimeBoundedSplittableProcessElementInvoker<
- InputT, OutputT, RestrictionT, PositionT, TrackerT>
+ RestrictionTracker<RestrictionT, PositionT> tracker =
+ doFnInvoker.invokeNewTracker(elem.getValue().getValue());
+ OutputAndTimeBoundedSplittableProcessElementInvoker<InputT, OutputT, RestrictionT, PositionT>
processElementInvoker =
new OutputAndTimeBoundedSplittableProcessElementInvoker<>(
context.doFn,
@@ -213,7 +212,7 @@
executor,
10000,
Duration.standardSeconds(10));
- SplittableProcessElementInvoker<InputT, OutputT, RestrictionT, TrackerT>.Result result =
+ SplittableProcessElementInvoker<InputT, OutputT, RestrictionT, PositionT>.Result result =
processElementInvoker.invokeProcessElement(doFnInvoker, element, tracker);
this.stateAccessor = null;
diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/ElementCountFnDataReceiver.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/ElementCountFnDataReceiver.java
index 3f47122..f234844 100644
--- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/ElementCountFnDataReceiver.java
+++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/ElementCountFnDataReceiver.java
@@ -21,8 +21,9 @@
import java.util.HashMap;
import org.apache.beam.runners.core.metrics.LabeledMetrics;
import org.apache.beam.runners.core.metrics.MetricsContainerStepMap;
+import org.apache.beam.runners.core.metrics.MonitoringInfoConstants;
+import org.apache.beam.runners.core.metrics.MonitoringInfoConstants.Labels;
import org.apache.beam.runners.core.metrics.MonitoringInfoMetricName;
-import org.apache.beam.runners.core.metrics.SimpleMonitoringInfoBuilder;
import org.apache.beam.sdk.fn.data.FnDataReceiver;
import org.apache.beam.sdk.metrics.Counter;
import org.apache.beam.sdk.metrics.MetricsContainer;
@@ -47,9 +48,9 @@
MetricsContainerStepMap metricContainerRegistry) {
this.original = original;
HashMap<String, String> labels = new HashMap<String, String>();
- labels.put(SimpleMonitoringInfoBuilder.PCOLLECTION_LABEL, pCollection);
+ labels.put(Labels.PCOLLECTION, pCollection);
MonitoringInfoMetricName metricName =
- MonitoringInfoMetricName.named(SimpleMonitoringInfoBuilder.ELEMENT_COUNT_URN, labels);
+ MonitoringInfoMetricName.named(MonitoringInfoConstants.Urns.ELEMENT_COUNT, labels);
this.counter = LabeledMetrics.counter(metricName);
// Collect the metric in a metric container which is not bound to the step name.
// This is required to count elements from impulse steps, which will produce elements outside
diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java
index 7bee753..0655d9e 100644
--- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java
+++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java
@@ -26,8 +26,8 @@
import org.apache.beam.runners.core.metrics.ExecutionStateTracker;
import org.apache.beam.runners.core.metrics.MetricsContainerImpl;
import org.apache.beam.runners.core.metrics.MetricsContainerStepMap;
+import org.apache.beam.runners.core.metrics.MonitoringInfoConstants;
import org.apache.beam.runners.core.metrics.SimpleExecutionState;
-import org.apache.beam.runners.core.metrics.SimpleMonitoringInfoBuilder;
import org.apache.beam.runners.core.metrics.SimpleStateRegistry;
import org.apache.beam.sdk.fn.data.FnDataReceiver;
import org.apache.beam.sdk.metrics.MetricsEnvironment;
@@ -86,11 +86,11 @@
}
HashMap<String, String> labelsMetadata = new HashMap<String, String>();
- labelsMetadata.put(SimpleMonitoringInfoBuilder.PTRANSFORM_LABEL, pTransformId);
+ labelsMetadata.put(MonitoringInfoConstants.Labels.PTRANSFORM, pTransformId);
SimpleExecutionState state =
new SimpleExecutionState(
ExecutionStateTracker.PROCESS_STATE_NAME,
- SimpleMonitoringInfoBuilder.PROCESS_BUNDLE_MSECS_URN,
+ MonitoringInfoConstants.Urns.PROCESS_BUNDLE_MSECS,
labelsMetadata);
executionStates.register(state);
// Wrap the consumer with extra logic to set the metric container with the appropriate
diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PTransformFunctionRegistry.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PTransformFunctionRegistry.java
index 6b20fe5..26f1ade 100644
--- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PTransformFunctionRegistry.java
+++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PTransformFunctionRegistry.java
@@ -25,8 +25,8 @@
import org.apache.beam.runners.core.metrics.ExecutionStateTracker;
import org.apache.beam.runners.core.metrics.MetricsContainerImpl;
import org.apache.beam.runners.core.metrics.MetricsContainerStepMap;
+import org.apache.beam.runners.core.metrics.MonitoringInfoConstants;
import org.apache.beam.runners.core.metrics.SimpleExecutionState;
-import org.apache.beam.runners.core.metrics.SimpleMonitoringInfoBuilder;
import org.apache.beam.runners.core.metrics.SimpleStateRegistry;
import org.apache.beam.sdk.function.ThrowingRunnable;
import org.apache.beam.sdk.metrics.MetricsEnvironment;
@@ -88,12 +88,12 @@
*/
public void register(String pTransformId, ThrowingRunnable runnable) {
HashMap<String, String> labelsMetadata = new HashMap<String, String>();
- labelsMetadata.put(SimpleMonitoringInfoBuilder.PTRANSFORM_LABEL, pTransformId);
+ labelsMetadata.put(MonitoringInfoConstants.Labels.PTRANSFORM, pTransformId);
String executionTimeUrn = "";
if (executionStateName.equals(ExecutionStateTracker.START_STATE_NAME)) {
- executionTimeUrn = SimpleMonitoringInfoBuilder.START_BUNDLE_MSECS_URN;
+ executionTimeUrn = MonitoringInfoConstants.Urns.START_BUNDLE_MSECS;
} else if (executionStateName.equals(ExecutionStateTracker.FINISH_STATE_NAME)) {
- executionTimeUrn = SimpleMonitoringInfoBuilder.FINISH_BUNDLE_MSECS_URN;
+ executionTimeUrn = MonitoringInfoConstants.Urns.FINISH_BUNDLE_MSECS;
}
SimpleExecutionState state =
diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java
index 69684e7..2e8b33c 100644
--- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java
+++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java
@@ -48,6 +48,7 @@
import org.apache.beam.runners.core.metrics.MetricUpdates.MetricUpdate;
import org.apache.beam.runners.core.metrics.MetricsContainerImpl;
import org.apache.beam.runners.core.metrics.MetricsContainerStepMap;
+import org.apache.beam.runners.core.metrics.MonitoringInfoConstants;
import org.apache.beam.runners.core.metrics.SimpleMonitoringInfoBuilder;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.coders.StringUtf8Coder;
@@ -663,13 +664,13 @@
List<MonitoringInfo> expected = new ArrayList<MonitoringInfo>();
SimpleMonitoringInfoBuilder builder = new SimpleMonitoringInfoBuilder();
- builder.setUrn(SimpleMonitoringInfoBuilder.ELEMENT_COUNT_URN);
+ builder.setUrn(MonitoringInfoConstants.Urns.ELEMENT_COUNT);
builder.setPCollectionLabel("Window.Into()/Window.Assign.out");
builder.setInt64Value(2);
expected.add(builder.build());
builder = new SimpleMonitoringInfoBuilder();
- builder.setUrn(SimpleMonitoringInfoBuilder.ELEMENT_COUNT_URN);
+ builder.setUrn(MonitoringInfoConstants.Urns.ELEMENT_COUNT);
builder.setPCollectionLabel(
"pTransformId/ParMultiDo(TestSideInputIsAccessibleForDownstreamCallers).output");
builder.setInt64Value(2);
diff --git a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/data/ElementCountFnDataReceiverTest.java b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/data/ElementCountFnDataReceiverTest.java
index 5490495..b6768e1 100644
--- a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/data/ElementCountFnDataReceiverTest.java
+++ b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/data/ElementCountFnDataReceiverTest.java
@@ -26,6 +26,7 @@
import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo;
import org.apache.beam.runners.core.metrics.MetricsContainerStepMap;
+import org.apache.beam.runners.core.metrics.MonitoringInfoConstants;
import org.apache.beam.runners.core.metrics.SimpleMonitoringInfoBuilder;
import org.apache.beam.sdk.fn.data.FnDataReceiver;
import org.apache.beam.sdk.metrics.MetricsEnvironment;
@@ -63,7 +64,7 @@
verify(consumer, times(numElements)).accept(element);
SimpleMonitoringInfoBuilder builder = new SimpleMonitoringInfoBuilder();
- builder.setUrn(SimpleMonitoringInfoBuilder.ELEMENT_COUNT_URN);
+ builder.setUrn(MonitoringInfoConstants.Urns.ELEMENT_COUNT);
builder.setPCollectionLabel(pCollectionA);
builder.setInt64Value(numElements);
MonitoringInfo expected = builder.build();
diff --git a/sdks/java/io/cassandra/build.gradle b/sdks/java/io/cassandra/build.gradle
index c68c53a..7876b47 100644
--- a/sdks/java/io/cassandra/build.gradle
+++ b/sdks/java/io/cassandra/build.gradle
@@ -17,23 +17,33 @@
*/
plugins { id 'org.apache.beam.module' }
+
// Do not relocate guava to avoid issues with Cassandra's version.
-applyJavaNature(shadowClosure: DEFAULT_SHADOW_CLOSURE << {
- dependencies {
- exclude(dependency(project.library.java.guava))
- }
-})
+applyJavaNature(
+ shadowClosure: {
+ dependencies {
+ // is default, but when omitted the default action is to include all runtime deps
+ include(dependency(project.library.java.guava))
+
+ // hack: now exclude the only thing that was included
+ exclude(dependency(project.library.java.guava))
+
+ }
+ },
+ shadowJarValidationExcludes: [
+ "org/apache/beam/**",
+ "com/google/common/**",
+ "com/google/thirdparty/**"
+ ]
+)
provideIntegrationTestingDependencies()
enableJavaPerformanceTesting()
description = "Apache Beam :: SDKs :: Java :: IO :: Cassandra"
ext.summary = "IO to read and write with Apache Cassandra database"
-def cassandra_embedded_version = "3.5.0.1"
-
-configurations.testRuntimeClasspath {
- exclude group: "org.slf4j", module: "slf4j-jdk14"
-}
+// compatible with all Cassandra versions up to 3.11.3
+def achilles_version = "6.0.2"
dependencies {
shadow library.java.vendored_guava_20_0
@@ -48,5 +58,8 @@
testCompile library.java.hamcrest_library
testCompile library.java.slf4j_jdk14
testCompile library.java.mockito_core
- testCompile group: 'org.cassandraunit', name: 'cassandra-unit', version: "$cassandra_embedded_version"
+
+ // for embedded cassandra
+ testCompile group: 'info.archinnov', name: 'achilles-junit', version: "$achilles_version"
+ testCompile library.java.jackson_jaxb_annotations
}
diff --git a/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraIO.java b/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraIO.java
index b92cfab..ea26769 100644
--- a/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraIO.java
+++ b/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraIO.java
@@ -59,6 +59,7 @@
import org.apache.beam.sdk.values.PDone;
import org.apache.beam.vendor.guava.v20_0.com.google.common.annotations.VisibleForTesting;
import org.apache.beam.vendor.guava.v20_0.com.google.common.base.Joiner;
+import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableList;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Iterators;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
@@ -739,7 +740,7 @@
@AutoValue
public abstract static class Write<T> extends PTransform<PCollection<T>, PDone> {
@Nullable
- abstract List<String> hosts();
+ abstract ImmutableList<String> hosts();
@Nullable
abstract Integer port();
diff --git a/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/SplitGenerator.java b/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/SplitGenerator.java
index a47f4cf..de49421 100644
--- a/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/SplitGenerator.java
+++ b/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/SplitGenerator.java
@@ -176,6 +176,6 @@
}
private BigInteger getTargetSplitSize(long splitCount) {
- return (rangeMax.subtract(rangeMin)).divide(BigInteger.valueOf(splitCount));
+ return rangeMax.subtract(rangeMin).divide(BigInteger.valueOf(splitCount));
}
}
diff --git a/sdks/java/io/cassandra/src/test/java/org/apache/beam/sdk/io/cassandra/CassandraIOTest.java b/sdks/java/io/cassandra/src/test/java/org/apache/beam/sdk/io/cassandra/CassandraIOTest.java
index bce119a..e705440 100644
--- a/sdks/java/io/cassandra/src/test/java/org/apache/beam/sdk/io/cassandra/CassandraIOTest.java
+++ b/sdks/java/io/cassandra/src/test/java/org/apache/beam/sdk/io/cassandra/CassandraIOTest.java
@@ -23,8 +23,10 @@
import static org.apache.beam.sdk.io.cassandra.CassandraIO.CassandraSource.getRingFraction;
import static org.apache.beam.sdk.io.cassandra.CassandraIO.CassandraSource.isMurmur3Partitioner;
import static org.apache.beam.sdk.testing.SourceTestUtils.readFromSource;
+import static org.hamcrest.Matchers.lessThan;
import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertNull;
+import static org.junit.Assert.assertThat;
import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.ResultSet;
@@ -35,8 +37,12 @@
import com.datastax.driver.mapping.annotations.Computed;
import com.datastax.driver.mapping.annotations.PartitionKey;
import com.datastax.driver.mapping.annotations.Table;
+import info.archinnov.achilles.embedded.CassandraEmbeddedServerBuilder;
+import info.archinnov.achilles.embedded.CassandraShutDownHook;
+import java.io.IOException;
import java.io.Serializable;
import java.math.BigInteger;
+import java.net.ServerSocket;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Iterator;
@@ -70,13 +76,13 @@
import org.apache.beam.vendor.guava.v20_0.com.google.common.util.concurrent.ListeningExecutorService;
import org.apache.beam.vendor.guava.v20_0.com.google.common.util.concurrent.MoreExecutors;
import org.apache.cassandra.service.StorageServiceMBean;
-import org.cassandraunit.utils.EmbeddedCassandraServerHelper;
import org.junit.AfterClass;
import org.junit.Assert;
-import org.junit.Before;
import org.junit.BeforeClass;
+import org.junit.ClassRule;
import org.junit.Rule;
import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
import org.junit.runner.RunWith;
import org.junit.runners.JUnit4;
import org.slf4j.Logger;
@@ -88,63 +94,70 @@
private static final long NUM_ROWS = 20L;
private static final String CASSANDRA_KEYSPACE = "beam_ks";
private static final String CASSANDRA_HOST = "127.0.0.1";
- private static final int CASSANDRA_PORT = 9142;
private static final String CASSANDRA_TABLE = "scientist";
private static final Logger LOGGER = LoggerFactory.getLogger(CassandraIOTest.class);
private static final String STORAGE_SERVICE_MBEAN = "org.apache.cassandra.db:type=StorageService";
- private static final String JMX_PORT = "7199";
- private static final long SIZE_ESTIMATES_UPDATE_INTERVAL = 5000L;
- private static final long STARTUP_TIMEOUT = 45000L;
+ private static final float ACCEPTABLE_EMPTY_SPLITS_PERCENTAGE = 0.5f;
+ private static final int FLUSH_TIMEOUT = 30000;
+ private static int jmxPort;
+ private static int cassandraPort;
private static Cluster cluster;
private static Session session;
- private static long startupTime;
+ @ClassRule public static final TemporaryFolder TEMPORARY_FOLDER = new TemporaryFolder();
@Rule public transient TestPipeline pipeline = TestPipeline.create();
+ private static CassandraShutDownHook shutdownHook;
@BeforeClass
public static void startCassandra() throws Exception {
- System.setProperty("cassandra.jmx.local.port", JMX_PORT);
- startupTime = System.currentTimeMillis();
- EmbeddedCassandraServerHelper.startEmbeddedCassandra(STARTUP_TIMEOUT);
- cluster = EmbeddedCassandraServerHelper.getCluster();
- session = EmbeddedCassandraServerHelper.getSession();
+ jmxPort = getFreeLocalPort();
+ shutdownHook = new CassandraShutDownHook();
+ // randomized port at startup
+ String data = TEMPORARY_FOLDER.newFolder("embedded-cassandra", "data").getPath();
+ String commitLog = TEMPORARY_FOLDER.newFolder("embedded-cassandra", "commit-log").getPath();
+ String cdcRaw = TEMPORARY_FOLDER.newFolder("embedded-cassandra", "cdc-raw").getPath();
+ String hints = TEMPORARY_FOLDER.newFolder("embedded-cassandra", "hints").getPath();
+ String savedCache = TEMPORARY_FOLDER.newFolder("embedded-cassandra", "saved-cache").getPath();
+ cluster =
+ CassandraEmbeddedServerBuilder.builder()
+ .withKeyspaceName(CASSANDRA_KEYSPACE)
+ .withDataFolder(data)
+ .withCommitLogFolder(commitLog)
+ .withCdcRawFolder(cdcRaw)
+ .withHintsFolder(hints)
+ .withSavedCachesFolder(savedCache)
+ .withShutdownHook(shutdownHook)
+ .withJMXPort(jmxPort)
+ .buildNativeCluster();
- LOGGER.info("Creating the Cassandra keyspace");
- session.execute(
- "CREATE KEYSPACE IF NOT EXISTS "
- + CASSANDRA_KEYSPACE
- + " WITH REPLICATION = "
- + "{'class':'SimpleStrategy', 'replication_factor':3};");
- LOGGER.info(CASSANDRA_KEYSPACE + " keyspace created");
+ cassandraPort = cluster.getConfiguration().getProtocolOptions().getPort();
+ session = CassandraIOTest.cluster.newSession();
- LOGGER.info("Use the Cassandra keyspace");
- session.execute("USE " + CASSANDRA_KEYSPACE);
-
- LOGGER.info("Create Cassandra table");
+ LOGGER.info("Create Cassandra tables");
session.execute(
String.format(
- "CREATE TABLE IF NOT EXISTS %s(person_id int, person_name text, PRIMARY KEY"
+ "CREATE TABLE IF NOT EXISTS %s.%s(person_id int, person_name text, PRIMARY KEY"
+ "(person_id));",
- CASSANDRA_TABLE));
+ CASSANDRA_KEYSPACE, CASSANDRA_TABLE));
+ session.execute(
+ String.format(
+ "CREATE TABLE IF NOT EXISTS %s.%s(person_id int, person_name text, PRIMARY KEY"
+ + "(person_id));",
+ CASSANDRA_KEYSPACE, CASSANDRA_TABLE_WRITE));
+ insertRecords();
+ }
+
+ private static int getFreeLocalPort() throws IOException {
+ ServerSocket serverSocket = new ServerSocket(0);
+ int port = serverSocket.getLocalPort();
+ serverSocket.close();
+ return port;
}
@AfterClass
- public static void stopCassandra() {
- if (cluster != null && session != null) {
- EmbeddedCassandraServerHelper.cleanEmbeddedCassandra();
- session.close();
- cluster.close();
- } else {
- if (cluster != null) {
- cluster.close();
- }
- }
- }
-
- @Before
- public void purgeCassandra() {
- session.execute(String.format("TRUNCATE TABLE %s.%s", CASSANDRA_KEYSPACE, CASSANDRA_TABLE));
+ public static void stopCassandra() throws InterruptedException {
+ shutdownHook.shutDownNow();
}
private static void insertRecords() throws Exception {
@@ -186,10 +199,13 @@
* https://github.com/apache/cassandra/blob/cassandra-3.X
* /src/java/org/apache/cassandra/tools/nodetool/Flush.java
*/
+ @SuppressWarnings("unused")
private static void flushMemTables() throws Exception {
JMXServiceURL url =
new JMXServiceURL(
- String.format("service:jmx:rmi:///jndi/rmi://%s:%s/jmxrmi", CASSANDRA_HOST, JMX_PORT));
+ String.format(
+ "service:jmx:rmi://%s/jndi/rmi://%s:%s/jmxrmi",
+ CASSANDRA_HOST, CASSANDRA_HOST, jmxPort));
JMXConnector jmxConnector = JMXConnectorFactory.connect(url, null);
MBeanServerConnection mBeanServerConnection = jmxConnector.getMBeanServerConnection();
ObjectName objectName = new ObjectName(STORAGE_SERVICE_MBEAN);
@@ -197,36 +213,31 @@
JMX.newMBeanProxy(mBeanServerConnection, objectName, StorageServiceMBean.class);
mBeanProxy.forceKeyspaceFlush(CASSANDRA_KEYSPACE, CASSANDRA_TABLE);
jmxConnector.close();
- // same method of waiting than cassandra spark connector
- long initialDelay = Math.max(startupTime + STARTUP_TIMEOUT - System.currentTimeMillis(), 0L);
- Thread.sleep(initialDelay + 2 * SIZE_ESTIMATES_UPDATE_INTERVAL);
+ Thread.sleep(FLUSH_TIMEOUT);
}
@Test
public void testEstimatedSizeBytes() throws Exception {
- insertRecords();
PipelineOptions pipelineOptions = PipelineOptionsFactory.create();
CassandraIO.Read<Scientist> read =
CassandraIO.<Scientist>read()
.withHosts(Collections.singletonList(CASSANDRA_HOST))
- .withPort(CASSANDRA_PORT)
+ .withPort(cassandraPort)
.withKeyspace(CASSANDRA_KEYSPACE)
.withTable(CASSANDRA_TABLE);
CassandraIO.CassandraSource<Scientist> source = new CassandraIO.CassandraSource<>(read, null);
long estimatedSizeBytes = source.getEstimatedSizeBytes(pipelineOptions);
// the size is non determanistic in Cassandra backend
- assertTrue((estimatedSizeBytes >= 4608L * 0.9f) && (estimatedSizeBytes <= 4608L * 1.1f));
+ assertTrue((estimatedSizeBytes >= 12960L * 0.9f) && (estimatedSizeBytes <= 12960L * 1.1f));
}
@Test
public void testRead() throws Exception {
- insertRecords();
-
PCollection<Scientist> output =
pipeline.apply(
CassandraIO.<Scientist>read()
.withHosts(Collections.singletonList(CASSANDRA_HOST))
- .withPort(CASSANDRA_PORT)
+ .withPort(cassandraPort)
.withKeyspace(CASSANDRA_KEYSPACE)
.withTable(CASSANDRA_TABLE)
.withCoder(SerializableCoder.of(Scientist.class))
@@ -257,13 +268,11 @@
@Test
public void testReadWithQuery() throws Exception {
- insertRecords();
-
PCollection<Scientist> output =
pipeline.apply(
CassandraIO.<Scientist>read()
.withHosts(Collections.singletonList(CASSANDRA_HOST))
- .withPort(CASSANDRA_PORT)
+ .withPort(cassandraPort)
.withKeyspace(CASSANDRA_KEYSPACE)
.withTable(CASSANDRA_TABLE)
.withQuery(
@@ -287,9 +296,9 @@
@Test
public void testWrite() {
- ArrayList<Scientist> data = new ArrayList<>();
+ ArrayList<ScientistWrite> data = new ArrayList<>();
for (int i = 0; i < NUM_ROWS; i++) {
- Scientist scientist = new Scientist();
+ ScientistWrite scientist = new ScientistWrite();
scientist.id = i;
scientist.name = "Name " + i;
data.add(scientist);
@@ -298,15 +307,15 @@
pipeline
.apply(Create.of(data))
.apply(
- CassandraIO.<Scientist>write()
+ CassandraIO.<ScientistWrite>write()
.withHosts(Collections.singletonList(CASSANDRA_HOST))
- .withPort(CASSANDRA_PORT)
+ .withPort(cassandraPort)
.withKeyspace(CASSANDRA_KEYSPACE)
- .withEntity(Scientist.class));
- // table to write to is specified in the entity in @Table annotation (in that cas person)
+ .withEntity(ScientistWrite.class));
+ // table to write to is specified in the entity in @Table annotation (in that case scientist)
pipeline.run();
- List<Row> results = getRows();
+ List<Row> results = getRows(CASSANDRA_TABLE_WRITE);
assertEquals(NUM_ROWS, results.size());
for (Row row : results) {
assertTrue(row.getString("person_name").matches("Name (\\d*)"));
@@ -353,7 +362,6 @@
@Test
public void testReadWithMapper() throws Exception {
- insertRecords();
counter.set(0);
SerializableFunction<Session, Mapper> factory = new NOOPMapperFactory();
@@ -361,7 +369,7 @@
pipeline.apply(
CassandraIO.<String>read()
.withHosts(Collections.singletonList(CASSANDRA_HOST))
- .withPort(CASSANDRA_PORT)
+ .withPort(cassandraPort)
.withKeyspace(CASSANDRA_KEYSPACE)
.withTable(CASSANDRA_TABLE)
.withCoder(SerializableCoder.of(String.class))
@@ -374,7 +382,6 @@
@Test
public void testCustomMapperImplWrite() throws Exception {
- insertRecords();
counter.set(0);
SerializableFunction<Session, Mapper> factory = new NOOPMapperFactory();
@@ -384,7 +391,7 @@
.apply(
CassandraIO.<String>write()
.withHosts(Collections.singletonList(CASSANDRA_HOST))
- .withPort(CASSANDRA_PORT)
+ .withPort(cassandraPort)
.withKeyspace(CASSANDRA_KEYSPACE)
.withMapperFactoryFn(factory)
.withEntity(String.class));
@@ -394,8 +401,7 @@
}
@Test
- public void testCustomMapperImplDelete() throws Exception {
- insertRecords();
+ public void testCustomMapperImplDelete() {
counter.set(0);
SerializableFunction<Session, Mapper> factory = new NOOPMapperFactory();
@@ -403,9 +409,9 @@
pipeline
.apply(Create.of(""))
.apply(
- CassandraIO.<String>write()
+ CassandraIO.<String>delete()
.withHosts(Collections.singletonList(CASSANDRA_HOST))
- .withPort(CASSANDRA_PORT)
+ .withPort(cassandraPort)
.withKeyspace(CASSANDRA_KEYSPACE)
.withMapperFactoryFn(factory)
.withEntity(String.class));
@@ -416,12 +422,11 @@
@Test
public void testSplit() throws Exception {
- insertRecords();
PipelineOptions options = PipelineOptionsFactory.create();
CassandraIO.Read<Scientist> read =
CassandraIO.<Scientist>read()
.withHosts(Collections.singletonList(CASSANDRA_HOST))
- .withPort(CASSANDRA_PORT)
+ .withPort(cassandraPort)
.withKeyspace(CASSANDRA_KEYSPACE)
.withTable(CASSANDRA_TABLE)
.withEntity(Scientist.class)
@@ -432,34 +437,35 @@
String splitQuery = QueryBuilder.select().from(CASSANDRA_KEYSPACE, CASSANDRA_TABLE).toString();
CassandraIO.CassandraSource<Scientist> initialSource =
new CassandraIO.CassandraSource<>(read, Collections.singletonList(splitQuery));
-
- int desiredBundleSizeBytes = 2000;
+ int desiredBundleSizeBytes = 2048;
List<BoundedSource<Scientist>> splits = initialSource.split(desiredBundleSizeBytes, options);
SourceTestUtils.assertSourcesEqualReferenceSource(initialSource, splits, options);
- int expectedNumSplits =
- (int) initialSource.getEstimatedSizeBytes(options) / desiredBundleSizeBytes;
- assertEquals(expectedNumSplits, splits.size());
- int nonEmptySplits = 0;
+ float expectedNumSplitsloat =
+ (float) initialSource.getEstimatedSizeBytes(options) / desiredBundleSizeBytes;
+ int expectedNumSplits = (int) Math.ceil(expectedNumSplitsloat);
+ assertEquals("Wrong number of splits", expectedNumSplits, splits.size());
+ int emptySplits = 0;
for (BoundedSource<Scientist> subSource : splits) {
- if (readFromSource(subSource, options).size() > 0) {
- nonEmptySplits += 1;
+ if (readFromSource(subSource, options).isEmpty()) {
+ emptySplits += 1;
}
}
- assertEquals("Wrong number of empty splits", expectedNumSplits, nonEmptySplits);
+ assertThat(
+ "There are too many empty splits, parallelism is sub-optimal",
+ emptySplits,
+ lessThan((int) (ACCEPTABLE_EMPTY_SPLITS_PERCENTAGE * splits.size())));
}
- private List<Row> getRows() {
+ private List<Row> getRows(String table) {
ResultSet result =
session.execute(
- String.format(
- "select person_id,person_name from %s.%s", CASSANDRA_KEYSPACE, CASSANDRA_TABLE));
+ String.format("select person_id,person_name from %s.%s", CASSANDRA_KEYSPACE, table));
return result.all();
}
@Test
public void testDelete() throws Exception {
- insertRecords();
- List<Row> results = getRows();
+ List<Row> results = getRows(CASSANDRA_TABLE);
assertEquals(NUM_ROWS, results.size());
Scientist einstein = new Scientist();
@@ -470,13 +476,23 @@
.apply(
CassandraIO.<Scientist>delete()
.withHosts(Collections.singletonList(CASSANDRA_HOST))
- .withPort(CASSANDRA_PORT)
+ .withPort(cassandraPort)
.withKeyspace(CASSANDRA_KEYSPACE)
.withEntity(Scientist.class));
pipeline.run();
- results = getRows();
+ results = getRows(CASSANDRA_TABLE);
assertEquals(NUM_ROWS - 1, results.size());
+ // re-insert suppressed doc to make the test autonomous
+ session.execute(
+ String.format(
+ "INSERT INTO %s.%s(person_id, person_name) values("
+ + einstein.id
+ + ", '"
+ + einstein.name
+ + "');",
+ CASSANDRA_KEYSPACE,
+ CASSANDRA_TABLE));
}
@Test
@@ -531,7 +547,7 @@
assertEquals(8000, getEstimatedSizeBytesFromTokenRanges(tokenRanges));
}
- /** Simple Cassandra entity used in test. */
+ /** Simple Cassandra entity used in read tests. */
@Table(name = CASSANDRA_TABLE, keyspace = CASSANDRA_KEYSPACE)
static class Scientist implements Serializable {
@@ -567,4 +583,9 @@
return Objects.hashCode(name, id);
}
}
+
+ private static final String CASSANDRA_TABLE_WRITE = "scientist_write";
+ /** Simple Cassandra entity used in write tests. */
+ @Table(name = CASSANDRA_TABLE_WRITE, keyspace = CASSANDRA_KEYSPACE)
+ static class ScientistWrite extends Scientist {}
}
diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java
index f75ba17..6f3bc0a 100644
--- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java
+++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java
@@ -34,6 +34,7 @@
import org.apache.beam.sdk.coders.ShardedKeyCoder;
import org.apache.beam.sdk.coders.StringUtf8Coder;
import org.apache.beam.sdk.coders.VoidCoder;
+import org.apache.beam.sdk.extensions.gcp.util.gcsfs.GcsPath;
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.CreateDisposition;
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.WriteDisposition;
import org.apache.beam.sdk.io.gcp.bigquery.WriteBundlesToFiles.Result;
@@ -57,7 +58,6 @@
import org.apache.beam.sdk.transforms.windowing.GlobalWindows;
import org.apache.beam.sdk.transforms.windowing.Repeatedly;
import org.apache.beam.sdk.transforms.windowing.Window;
-import org.apache.beam.sdk.util.gcsfs.GcsPath;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.values.PCollectionList;
diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryHelpers.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryHelpers.java
index 8e90467..82bb5ad 100644
--- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryHelpers.java
+++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryHelpers.java
@@ -37,13 +37,13 @@
import java.util.UUID;
import java.util.regex.Matcher;
import javax.annotation.Nullable;
+import org.apache.beam.sdk.extensions.gcp.util.BackOffAdapter;
import org.apache.beam.sdk.io.FileSystems;
import org.apache.beam.sdk.io.fs.ResolveOptions;
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryServices.DatasetService;
import org.apache.beam.sdk.options.ValueProvider;
import org.apache.beam.sdk.options.ValueProvider.NestedValueProvider;
import org.apache.beam.sdk.transforms.SerializableFunction;
-import org.apache.beam.sdk.util.BackOffAdapter;
import org.apache.beam.sdk.util.FluentBackoff;
import org.apache.beam.vendor.guava.v20_0.com.google.common.annotations.VisibleForTesting;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Lists;
diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
index 85cdba4..87e1ddb 100644
--- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
+++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
@@ -55,10 +55,11 @@
import org.apache.beam.sdk.coders.KvCoder;
import org.apache.beam.sdk.coders.StringUtf8Coder;
import org.apache.beam.sdk.extensions.gcp.options.GcpOptions;
+import org.apache.beam.sdk.extensions.gcp.util.Transport;
+import org.apache.beam.sdk.extensions.gcp.util.gcsfs.GcsPath;
import org.apache.beam.sdk.extensions.protobuf.ProtoCoder;
import org.apache.beam.sdk.io.BoundedSource;
import org.apache.beam.sdk.io.FileSystems;
-import org.apache.beam.sdk.io.Read;
import org.apache.beam.sdk.io.fs.MoveOptions;
import org.apache.beam.sdk.io.fs.ResolveOptions;
import org.apache.beam.sdk.io.fs.ResourceId;
@@ -93,8 +94,6 @@
import org.apache.beam.sdk.transforms.SimpleFunction;
import org.apache.beam.sdk.transforms.View;
import org.apache.beam.sdk.transforms.display.DisplayData;
-import org.apache.beam.sdk.util.Transport;
-import org.apache.beam.sdk.util.gcsfs.GcsPath;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.values.PBegin;
import org.apache.beam.sdk.values.PCollection;
diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java
index 48af213..4506f64 100644
--- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java
+++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java
@@ -53,4 +53,22 @@
Integer getInsertBundleParallelism();
void setInsertBundleParallelism(Integer parallelism);
+
+ @Description("The number of keys used per table when doing streaming inserts to BigQuery.")
+ @Default.Integer(50)
+ Integer getNumStreamingKeys();
+
+ void setNumStreamingKeys(Integer value);
+
+ @Description("The maximum number of rows to batch in a single streaming insert to BigQuery.")
+ @Default.Long(500)
+ Long getMaxStreamingRowsToBatch();
+
+ void setMaxStreamingRowsToBatch(Long value);
+
+ @Description("The maximum byte size of a single streaming insert to BigQuery.")
+ @Default.Long(64L * 1024L)
+ Long getMaxStreamingBatchSize();
+
+ void setMaxStreamingBatchSize(Long value);
}
diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java
index 3d3164e..d01add4 100644
--- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java
+++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java
@@ -74,14 +74,14 @@
import org.apache.beam.sdk.annotations.Experimental;
import org.apache.beam.sdk.extensions.gcp.auth.NullCredentialInitializer;
import org.apache.beam.sdk.extensions.gcp.options.GcsOptions;
+import org.apache.beam.sdk.extensions.gcp.util.BackOffAdapter;
+import org.apache.beam.sdk.extensions.gcp.util.CustomHttpErrors;
+import org.apache.beam.sdk.extensions.gcp.util.RetryHttpRequestInitializer;
+import org.apache.beam.sdk.extensions.gcp.util.Transport;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.transforms.SerializableFunction;
-import org.apache.beam.sdk.util.BackOffAdapter;
-import org.apache.beam.sdk.util.CustomHttpErrors;
import org.apache.beam.sdk.util.FluentBackoff;
import org.apache.beam.sdk.util.ReleaseInfo;
-import org.apache.beam.sdk.util.RetryHttpRequestInitializer;
-import org.apache.beam.sdk.util.Transport;
import org.apache.beam.sdk.values.ValueInSingleWindow;
import org.apache.beam.vendor.guava.v20_0.com.google.common.annotations.VisibleForTesting;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableList;
@@ -376,12 +376,6 @@
@VisibleForTesting
static class DatasetServiceImpl implements DatasetService {
- // Approximate amount of table data to upload per InsertAll request.
- private static final long UPLOAD_BATCH_SIZE_BYTES = 64L * 1024L;
-
- // The maximum number of rows to upload per InsertAll request.
- private static final long MAX_ROWS_PER_BATCH = 500;
-
private static final FluentBackoff INSERT_BACKOFF_FACTORY =
FluentBackoff.DEFAULT.withInitialBackoff(Duration.millis(200)).withMaxRetries(5);
@@ -395,24 +389,29 @@
private final Bigquery client;
private final PipelineOptions options;
private final long maxRowsPerBatch;
+ private final long maxRowBatchSize;
private ExecutorService executor;
@VisibleForTesting
DatasetServiceImpl(Bigquery client, PipelineOptions options) {
+ BigQueryOptions bqOptions = options.as(BigQueryOptions.class);
this.errorExtractor = new ApiErrorExtractor();
this.client = client;
this.options = options;
- this.maxRowsPerBatch = MAX_ROWS_PER_BATCH;
+ this.maxRowsPerBatch = bqOptions.getMaxStreamingRowsToBatch();
+ this.maxRowBatchSize = bqOptions.getMaxStreamingBatchSize();
this.executor = null;
}
@VisibleForTesting
DatasetServiceImpl(Bigquery client, PipelineOptions options, long maxRowsPerBatch) {
+ BigQueryOptions bqOptions = options.as(BigQueryOptions.class);
this.errorExtractor = new ApiErrorExtractor();
this.client = client;
this.options = options;
this.maxRowsPerBatch = maxRowsPerBatch;
+ this.maxRowBatchSize = bqOptions.getMaxStreamingBatchSize();
this.executor = null;
}
@@ -420,7 +419,8 @@
this.errorExtractor = new ApiErrorExtractor();
this.client = newBigQueryClient(bqOptions).build();
this.options = bqOptions;
- this.maxRowsPerBatch = MAX_ROWS_PER_BATCH;
+ this.maxRowsPerBatch = bqOptions.getMaxStreamingRowsToBatch();
+ this.maxRowBatchSize = bqOptions.getMaxStreamingBatchSize();
this.executor = null;
}
@@ -752,7 +752,7 @@
rows.add(out);
dataSize += row.toString().length();
- if (dataSize >= UPLOAD_BATCH_SIZE_BYTES
+ if (dataSize >= maxRowBatchSize
|| rows.size() >= maxRowsPerBatch
|| i == rowsToPublish.size() - 1) {
TableDataInsertAllRequest content = new TableDataInsertAllRequest();
diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java
index 3ca7aba..ea2c020 100644
--- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java
+++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java
@@ -190,6 +190,9 @@
TupleTag<T> failedInsertsTag,
AtomicCoder<T> coder,
ErrorContainer<T> errorContainer) {
+ BigQueryOptions options = input.getPipeline().getOptions().as(BigQueryOptions.class);
+ int numShards = options.getNumStreamingKeys();
+
// A naive implementation would be to simply stream data directly to BigQuery.
// However, this could occasionally lead to duplicated data, e.g., when
// a VM that runs this code is restarted and the code is re-run.
@@ -204,7 +207,7 @@
// streaming insert quota.
PCollection<KV<ShardedKey<String>, TableRowInfo<ElementT>>> tagged =
input
- .apply("ShardTableWrites", ParDo.of(new GenerateShardedTable<>(50)))
+ .apply("ShardTableWrites", ParDo.of(new GenerateShardedTable<>(numShards)))
.setCoder(KvCoder.of(ShardedKeyCoder.of(StringUtf8Coder.of()), elementCoder))
.apply("TagWithUniqueIds", ParDo.of(new TagWithUniqueIds<>()))
.setCoder(
diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TestBigQuery.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TestBigQuery.java
index ed10c21..627a182 100644
--- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TestBigQuery.java
+++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TestBigQuery.java
@@ -36,12 +36,12 @@
import java.util.concurrent.ThreadLocalRandom;
import java.util.stream.Collectors;
import org.apache.beam.sdk.extensions.gcp.auth.NullCredentialInitializer;
+import org.apache.beam.sdk.extensions.gcp.util.RetryHttpRequestInitializer;
+import org.apache.beam.sdk.extensions.gcp.util.Transport;
import org.apache.beam.sdk.schemas.Schema;
import org.apache.beam.sdk.schemas.Schema.FieldType;
import org.apache.beam.sdk.testing.TestPipeline;
import org.apache.beam.sdk.testing.TestPipelineOptions;
-import org.apache.beam.sdk.util.RetryHttpRequestInitializer;
-import org.apache.beam.sdk.util.Transport;
import org.apache.beam.sdk.values.Row;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableList;
import org.hamcrest.Matcher;
diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore/DatastoreV1.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore/DatastoreV1.java
index aecdb2c..d35fc74 100644
--- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore/DatastoreV1.java
+++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore/DatastoreV1.java
@@ -67,6 +67,7 @@
import org.apache.beam.sdk.annotations.Experimental.Kind;
import org.apache.beam.sdk.coders.StringUtf8Coder;
import org.apache.beam.sdk.extensions.gcp.options.GcpOptions;
+import org.apache.beam.sdk.extensions.gcp.util.RetryHttpRequestInitializer;
import org.apache.beam.sdk.metrics.Counter;
import org.apache.beam.sdk.metrics.Metrics;
import org.apache.beam.sdk.options.PipelineOptions;
@@ -85,7 +86,6 @@
import org.apache.beam.sdk.util.BackOff;
import org.apache.beam.sdk.util.BackOffUtils;
import org.apache.beam.sdk.util.FluentBackoff;
-import org.apache.beam.sdk.util.RetryHttpRequestInitializer;
import org.apache.beam.sdk.util.Sleeper;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.values.PBegin;
diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubJsonClient.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubJsonClient.java
index 9277055..5e423fb 100644
--- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubJsonClient.java
+++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubJsonClient.java
@@ -45,8 +45,8 @@
import java.util.Map;
import java.util.TreeMap;
import javax.annotation.Nullable;
-import org.apache.beam.sdk.util.RetryHttpRequestInitializer;
-import org.apache.beam.sdk.util.Transport;
+import org.apache.beam.sdk.extensions.gcp.util.RetryHttpRequestInitializer;
+import org.apache.beam.sdk.extensions.gcp.util.Transport;
import org.apache.beam.vendor.guava.v20_0.com.google.common.annotations.VisibleForTesting;
import org.apache.beam.vendor.guava.v20_0.com.google.common.base.Strings;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableList;
diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/BigqueryClient.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/BigqueryClient.java
index abf8a2e..d49d060 100644
--- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/BigqueryClient.java
+++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/BigqueryClient.java
@@ -60,9 +60,9 @@
import java.util.stream.Collectors;
import javax.annotation.Nonnull;
import javax.annotation.Nullable;
-import org.apache.beam.sdk.util.BackOffAdapter;
+import org.apache.beam.sdk.extensions.gcp.util.BackOffAdapter;
+import org.apache.beam.sdk.extensions.gcp.util.Transport;
import org.apache.beam.sdk.util.FluentBackoff;
-import org.apache.beam.sdk.util.Transport;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableList;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Lists;
import org.joda.time.Duration;
diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryHelpersTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryHelpersTest.java
index c2d751c..d630799 100644
--- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryHelpersTest.java
+++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryHelpersTest.java
@@ -32,12 +32,12 @@
import org.apache.beam.sdk.coders.KvCoder;
import org.apache.beam.sdk.coders.ShardedKeyCoder;
import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.extensions.gcp.util.BackOffAdapter;
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.PendingJob;
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.PendingJobManager;
import org.apache.beam.sdk.testing.CoderProperties;
import org.apache.beam.sdk.transforms.windowing.GlobalWindow;
import org.apache.beam.sdk.transforms.windowing.IntervalWindow;
-import org.apache.beam.sdk.util.BackOffAdapter;
import org.apache.beam.sdk.util.CoderUtils;
import org.apache.beam.sdk.util.FluentBackoff;
import org.apache.beam.sdk.util.WindowedValue;
diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImplTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImplTest.java
index ec36475..29c9dcb 100644
--- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImplTest.java
+++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImplTest.java
@@ -65,17 +65,17 @@
import java.io.InputStream;
import java.util.ArrayList;
import java.util.List;
+import org.apache.beam.sdk.extensions.gcp.util.BackOffAdapter;
+import org.apache.beam.sdk.extensions.gcp.util.FastNanoClockAndSleeper;
+import org.apache.beam.sdk.extensions.gcp.util.RetryHttpRequestInitializer;
+import org.apache.beam.sdk.extensions.gcp.util.Transport;
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.DatasetServiceImpl;
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.JobServiceImpl;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.testing.ExpectedLogs;
import org.apache.beam.sdk.transforms.windowing.GlobalWindow;
import org.apache.beam.sdk.transforms.windowing.PaneInfo;
-import org.apache.beam.sdk.util.BackOffAdapter;
-import org.apache.beam.sdk.util.FastNanoClockAndSleeper;
import org.apache.beam.sdk.util.FluentBackoff;
-import org.apache.beam.sdk.util.RetryHttpRequestInitializer;
-import org.apache.beam.sdk.util.Transport;
import org.apache.beam.sdk.values.ValueInSingleWindow;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableList;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Lists;
diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryToTableIT.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryToTableIT.java
index 437bbf9..596108d 100644
--- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryToTableIT.java
+++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryToTableIT.java
@@ -38,6 +38,7 @@
import org.apache.beam.sdk.coders.KvCoder;
import org.apache.beam.sdk.coders.VoidCoder;
import org.apache.beam.sdk.extensions.gcp.options.GcpOptions;
+import org.apache.beam.sdk.extensions.gcp.util.BackOffAdapter;
import org.apache.beam.sdk.io.gcp.testing.BigqueryClient;
import org.apache.beam.sdk.options.Default;
import org.apache.beam.sdk.options.Description;
@@ -50,7 +51,6 @@
import org.apache.beam.sdk.transforms.Reshuffle;
import org.apache.beam.sdk.transforms.Values;
import org.apache.beam.sdk.transforms.WithKeys;
-import org.apache.beam.sdk.util.BackOffAdapter;
import org.apache.beam.sdk.util.FluentBackoff;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableList;
diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/FakeJobService.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/FakeJobService.java
index c5ce93b..436dcdb 100644
--- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/FakeJobService.java
+++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/FakeJobService.java
@@ -61,15 +61,15 @@
import org.apache.beam.sdk.annotations.Experimental;
import org.apache.beam.sdk.coders.Coder;
import org.apache.beam.sdk.coders.Coder.Context;
+import org.apache.beam.sdk.extensions.gcp.util.BackOffAdapter;
+import org.apache.beam.sdk.extensions.gcp.util.Transport;
import org.apache.beam.sdk.io.FileSystems;
import org.apache.beam.sdk.io.fs.ResourceId;
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.CreateDisposition;
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.WriteDisposition;
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryServices.JobService;
-import org.apache.beam.sdk.util.BackOffAdapter;
import org.apache.beam.sdk.util.FluentBackoff;
import org.apache.beam.sdk.util.MimeTypes;
-import org.apache.beam.sdk.util.Transport;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.HashBasedTable;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableList;
diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/V1TestUtil.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/V1TestUtil.java
index a3ed49d..82cfec4 100644
--- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/V1TestUtil.java
+++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/V1TestUtil.java
@@ -51,12 +51,12 @@
import java.util.UUID;
import javax.annotation.Nullable;
import org.apache.beam.sdk.extensions.gcp.options.GcpOptions;
+import org.apache.beam.sdk.extensions.gcp.util.RetryHttpRequestInitializer;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.util.BackOff;
import org.apache.beam.sdk.util.BackOffUtils;
import org.apache.beam.sdk.util.FluentBackoff;
-import org.apache.beam.sdk.util.RetryHttpRequestInitializer;
import org.apache.beam.sdk.util.Sleeper;
import org.joda.time.Duration;
import org.slf4j.Logger;
diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/storage/GcsKmsKeyIT.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/storage/GcsKmsKeyIT.java
index 2f64b1f..bee548e 100644
--- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/storage/GcsKmsKeyIT.java
+++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/storage/GcsKmsKeyIT.java
@@ -29,6 +29,8 @@
import org.apache.beam.sdk.PipelineResult;
import org.apache.beam.sdk.PipelineResult.State;
import org.apache.beam.sdk.extensions.gcp.options.GcsOptions;
+import org.apache.beam.sdk.extensions.gcp.util.GcsUtil;
+import org.apache.beam.sdk.extensions.gcp.util.gcsfs.GcsPath;
import org.apache.beam.sdk.io.FileSystems;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.io.fs.MatchResult;
@@ -39,8 +41,6 @@
import org.apache.beam.sdk.testing.FileChecksumMatcher;
import org.apache.beam.sdk.testing.TestPipeline;
import org.apache.beam.sdk.testing.TestPipelineOptions;
-import org.apache.beam.sdk.util.GcsUtil;
-import org.apache.beam.sdk.util.gcsfs.GcsPath;
import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Iterables;
import org.junit.BeforeClass;
import org.junit.Test;
diff --git a/sdks/java/io/hadoop-format/src/main/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIO.java b/sdks/java/io/hadoop-format/src/main/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIO.java
index 1d7fc32..8819024 100644
--- a/sdks/java/io/hadoop-format/src/main/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIO.java
+++ b/sdks/java/io/hadoop-format/src/main/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIO.java
@@ -247,7 +247,7 @@
* MyDbOutputFormatKeyClass, Object.class);
* myHadoopConfiguration.setClass("mapreduce.job.output.value.class",
* MyDbOutputFormatValueClass, Object.class);
- * myHadoopConfiguration.setClass("mapreduce.job.output.value.class",
+ * myHadoopConfiguration.setClass("mapreduce.job.partitioner.class",
* MyPartitionerClass, Object.class);
* myHadoopConfiguration.setInt("mapreduce.job.reduces", 2);
* }</pre>
diff --git a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOCassandraTest.java b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOCassandraTest.java
index 6000c79..04e80b8 100644
--- a/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOCassandraTest.java
+++ b/sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOCassandraTest.java
@@ -23,7 +23,14 @@
import com.datastax.driver.core.SocketOptions;
import com.datastax.driver.mapping.annotations.Column;
import com.datastax.driver.mapping.annotations.Table;
+import java.io.File;
+import java.io.IOException;
import java.io.Serializable;
+import java.net.ServerSocket;
+import java.net.URI;
+import java.nio.charset.Charset;
+import java.nio.file.Files;
+import java.nio.file.Path;
import org.apache.beam.sdk.io.common.HashingFn;
import org.apache.beam.sdk.testing.PAssert;
import org.apache.beam.sdk.testing.TestPipeline;
@@ -58,8 +65,8 @@
private static final String CASSANDRA_PARTITIONER_CLASS_VALUE = "Murmur3Partitioner";
private static final String CASSANDRA_KEYSPACE_PROPERTY = "cassandra.input.keyspace";
private static final String CASSANDRA_COLUMNFAMILY_PROPERTY = "cassandra.input.columnfamily";
- private static final String CASSANDRA_PORT = "9061";
- private static final String CASSANDRA_NATIVE_PORT = "9042";
+ private static int cassandraPort;
+ private static int cassandraNativePort;
private static transient Cluster cluster;
private static transient Session session;
private static final long TEST_DATA_ROW_COUNT = 10L;
@@ -67,6 +74,13 @@
@Rule public final transient TestPipeline p = TestPipeline.create();
+ private static int getFreeLocalPort() throws IOException {
+ ServerSocket serverSocket = new ServerSocket(0);
+ int port = serverSocket.getLocalPort();
+ serverSocket.close();
+ return port;
+ }
+
/**
* Test to read data from embedded Cassandra instance and verify whether data is read
* successfully.
@@ -140,8 +154,8 @@
*/
private Configuration getConfiguration() {
Configuration conf = new Configuration();
- conf.set(CASSANDRA_NATIVE_PORT_PROPERTY, CASSANDRA_NATIVE_PORT);
- conf.set(CASSANDRA_THRIFT_PORT_PROPERTY, CASSANDRA_PORT);
+ conf.set(CASSANDRA_NATIVE_PORT_PROPERTY, String.valueOf(cassandraNativePort));
+ conf.set(CASSANDRA_THRIFT_PORT_PROPERTY, String.valueOf(cassandraPort));
conf.set(CASSANDRA_THRIFT_ADDRESS_PROPERTY, CASSANDRA_HOST);
conf.set(CASSANDRA_PARTITIONER_CLASS_PROPERTY, CASSANDRA_PARTITIONER_CLASS_VALUE);
conf.set(CASSANDRA_KEYSPACE_PROPERTY, CASSANDRA_KEYSPACE);
@@ -178,6 +192,9 @@
@BeforeClass
public static void startCassandra() throws Exception {
+ cassandraPort = getFreeLocalPort();
+ cassandraNativePort = getFreeLocalPort();
+ replacePortsInConfFile();
// Start the Embedded Cassandra Service
cassandra.start();
final SocketOptions socketOptions = new SocketOptions();
@@ -190,12 +207,21 @@
.addContactPoint(CASSANDRA_HOST)
.withClusterName("beam")
.withSocketOptions(socketOptions)
- .withPort(Integer.valueOf(CASSANDRA_NATIVE_PORT))
+ .withPort(cassandraNativePort)
.build();
session = cluster.connect();
createCassandraData();
}
+ private static void replacePortsInConfFile() throws Exception {
+ URI uri = HadoopFormatIOCassandraTest.class.getResource("/cassandra.yaml").toURI();
+ Path cassandraYamlPath = new File(uri).toPath();
+ String content = new String(Files.readAllBytes(cassandraYamlPath), Charset.defaultCharset());
+ content = content.replaceAll("9042", String.valueOf(cassandraNativePort));
+ content = content.replaceAll("9061", String.valueOf(cassandraPort));
+ Files.write(cassandraYamlPath, content.getBytes(Charset.defaultCharset()));
+ }
+
@AfterClass
public static void stopEmbeddedCassandra() {
session.close();
diff --git a/sdks/java/io/hadoop-input-format/src/test/java/org/apache/beam/sdk/io/hadoop/inputformat/HIFIOWithEmbeddedCassandraTest.java b/sdks/java/io/hadoop-input-format/src/test/java/org/apache/beam/sdk/io/hadoop/inputformat/HIFIOWithEmbeddedCassandraTest.java
index c8d7d8d..a082b75 100644
--- a/sdks/java/io/hadoop-input-format/src/test/java/org/apache/beam/sdk/io/hadoop/inputformat/HIFIOWithEmbeddedCassandraTest.java
+++ b/sdks/java/io/hadoop-input-format/src/test/java/org/apache/beam/sdk/io/hadoop/inputformat/HIFIOWithEmbeddedCassandraTest.java
@@ -23,7 +23,14 @@
import com.datastax.driver.core.SocketOptions;
import com.datastax.driver.mapping.annotations.Column;
import com.datastax.driver.mapping.annotations.Table;
+import java.io.File;
+import java.io.IOException;
import java.io.Serializable;
+import java.net.ServerSocket;
+import java.net.URI;
+import java.nio.charset.Charset;
+import java.nio.file.Files;
+import java.nio.file.Path;
import org.apache.beam.sdk.io.common.HashingFn;
import org.apache.beam.sdk.testing.PAssert;
import org.apache.beam.sdk.testing.TestPipeline;
@@ -58,8 +65,8 @@
private static final String CASSANDRA_PARTITIONER_CLASS_VALUE = "Murmur3Partitioner";
private static final String CASSANDRA_KEYSPACE_PROPERTY = "cassandra.input.keyspace";
private static final String CASSANDRA_COLUMNFAMILY_PROPERTY = "cassandra.input.columnfamily";
- private static final String CASSANDRA_PORT = "9062";
- private static final String CASSANDRA_NATIVE_PORT = "9043";
+ private static int cassandraPort;
+ private static int cassandraNativePort;
private static transient Cluster cluster;
private static transient Session session;
private static final long TEST_DATA_ROW_COUNT = 10L;
@@ -67,6 +74,13 @@
@Rule public final transient TestPipeline p = TestPipeline.create();
+ private static int getFreeLocalPort() throws IOException {
+ ServerSocket serverSocket = new ServerSocket(0);
+ int port = serverSocket.getLocalPort();
+ serverSocket.close();
+ return port;
+ }
+
/**
* Test to read data from embedded Cassandra instance and verify whether data is read
* successfully.
@@ -140,8 +154,8 @@
*/
private Configuration getConfiguration() {
Configuration conf = new Configuration();
- conf.set(CASSANDRA_NATIVE_PORT_PROPERTY, CASSANDRA_NATIVE_PORT);
- conf.set(CASSANDRA_THRIFT_PORT_PROPERTY, CASSANDRA_PORT);
+ conf.set(CASSANDRA_NATIVE_PORT_PROPERTY, String.valueOf(cassandraNativePort));
+ conf.set(CASSANDRA_THRIFT_PORT_PROPERTY, String.valueOf(cassandraPort));
conf.set(CASSANDRA_THRIFT_ADDRESS_PROPERTY, CASSANDRA_HOST);
conf.set(CASSANDRA_PARTITIONER_CLASS_PROPERTY, CASSANDRA_PARTITIONER_CLASS_VALUE);
conf.set(CASSANDRA_KEYSPACE_PROPERTY, CASSANDRA_KEYSPACE);
@@ -178,6 +192,9 @@
@BeforeClass
public static void startCassandra() throws Exception {
+ cassandraPort = getFreeLocalPort();
+ cassandraNativePort = getFreeLocalPort();
+ replacePortsInConfFile();
// Start the Embedded Cassandra Service
cassandra.start();
final SocketOptions socketOptions = new SocketOptions();
@@ -190,12 +207,21 @@
.addContactPoint(CASSANDRA_HOST)
.withClusterName("beam")
.withSocketOptions(socketOptions)
- .withPort(Integer.valueOf(CASSANDRA_NATIVE_PORT))
+ .withPort(cassandraNativePort)
.build();
session = cluster.connect();
createCassandraData();
}
+ private static void replacePortsInConfFile() throws Exception {
+ URI uri = HIFIOWithEmbeddedCassandraTest.class.getResource("/cassandra.yaml").toURI();
+ Path cassandraYamlPath = new File(uri).toPath();
+ String content = new String(Files.readAllBytes(cassandraYamlPath), Charset.defaultCharset());
+ content = content.replaceAll("9043", String.valueOf(cassandraNativePort));
+ content = content.replaceAll("9161", String.valueOf(cassandraPort));
+ Files.write(cassandraYamlPath, content.getBytes(Charset.defaultCharset()));
+ }
+
@AfterClass
public static void stopEmbeddedCassandra() {
session.close();
diff --git a/sdks/java/io/hbase/src/main/java/org/apache/beam/sdk/io/hbase/HBaseReadSplittableDoFn.java b/sdks/java/io/hbase/src/main/java/org/apache/beam/sdk/io/hbase/HBaseReadSplittableDoFn.java
index a3c17e6..dc99b76 100644
--- a/sdks/java/io/hbase/src/main/java/org/apache/beam/sdk/io/hbase/HBaseReadSplittableDoFn.java
+++ b/sdks/java/io/hbase/src/main/java/org/apache/beam/sdk/io/hbase/HBaseReadSplittableDoFn.java
@@ -25,6 +25,7 @@
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.DoFn.BoundedPerElement;
import org.apache.beam.sdk.transforms.splittabledofn.ByteKeyRangeTracker;
+import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker;
import org.apache.hadoop.hbase.HRegionLocation;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Connection;
@@ -57,7 +58,8 @@
}
@ProcessElement
- public void processElement(ProcessContext c, ByteKeyRangeTracker tracker) throws Exception {
+ public void processElement(ProcessContext c, RestrictionTracker<ByteKeyRange, ByteKey> tracker)
+ throws Exception {
final HBaseQuery query = c.element();
TableName tableName = TableName.valueOf(query.getTableId());
Table table = connection.getTable(tableName);
diff --git a/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java b/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
index 354b26c..e6f2699 100644
--- a/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
+++ b/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
@@ -160,7 +160,10 @@
* @param <T> Type of the data to be read.
*/
public static <T> Read<T> read() {
- return new AutoValue_JdbcIO_Read.Builder<T>().setFetchSize(DEFAULT_FETCH_SIZE).build();
+ return new AutoValue_JdbcIO_Read.Builder<T>()
+ .setFetchSize(DEFAULT_FETCH_SIZE)
+ .setOutputParallelization(true)
+ .build();
}
/**
@@ -173,6 +176,7 @@
public static <ParameterT, OutputT> ReadAll<ParameterT, OutputT> readAll() {
return new AutoValue_JdbcIO_ReadAll.Builder<ParameterT, OutputT>()
.setFetchSize(DEFAULT_FETCH_SIZE)
+ .setOutputParallelization(true)
.build();
}
@@ -399,6 +403,8 @@
abstract int getFetchSize();
+ abstract boolean getOutputParallelization();
+
abstract Builder<T> toBuilder();
@AutoValue.Builder
@@ -415,6 +421,8 @@
abstract Builder<T> setFetchSize(int fetchSize);
+ abstract Builder<T> setOutputParallelization(boolean outputParallelization);
+
abstract Read<T> build();
}
@@ -457,6 +465,14 @@
return toBuilder().setFetchSize(fetchSize).build();
}
+ /**
+ * Whether to reshuffle the resulting PCollection so results are distributed to all workers. The
+ * default is to parallelize and should only be changed if this is known to be unnecessary.
+ */
+ public Read<T> withOutputParallelization(boolean outputParallelization) {
+ return toBuilder().setOutputParallelization(outputParallelization).build();
+ }
+
@Override
public PCollection<T> expand(PBegin input) {
checkArgument(getQuery() != null, "withQuery() is required");
@@ -474,6 +490,7 @@
.withCoder(getCoder())
.withRowMapper(getRowMapper())
.withFetchSize(getFetchSize())
+ .withOutputParallelization(getOutputParallelization())
.withParameterSetter(
(element, preparedStatement) -> {
if (getStatementPreparator() != null) {
@@ -515,6 +532,8 @@
abstract int getFetchSize();
+ abstract boolean getOutputParallelization();
+
abstract Builder<ParameterT, OutputT> toBuilder();
@AutoValue.Builder
@@ -533,6 +552,8 @@
abstract Builder<ParameterT, OutputT> setFetchSize(int fetchSize);
+ abstract Builder<ParameterT, OutputT> setOutputParallelization(boolean outputParallelization);
+
abstract ReadAll<ParameterT, OutputT> build();
}
@@ -582,19 +603,33 @@
return toBuilder().setFetchSize(fetchSize).build();
}
+ /**
+ * Whether to reshuffle the resulting PCollection so results are distributed to all workers. The
+ * default is to parallelize and should only be changed if this is known to be unnecessary.
+ */
+ public ReadAll<ParameterT, OutputT> withOutputParallelization(boolean outputParallelization) {
+ return toBuilder().setOutputParallelization(outputParallelization).build();
+ }
+
@Override
public PCollection<OutputT> expand(PCollection<ParameterT> input) {
- return input
- .apply(
- ParDo.of(
- new ReadFn<>(
- getDataSourceConfiguration(),
- getQuery(),
- getParameterSetter(),
- getRowMapper(),
- getFetchSize())))
- .setCoder(getCoder())
- .apply(new Reparallelize<>());
+ PCollection<OutputT> output =
+ input
+ .apply(
+ ParDo.of(
+ new ReadFn<>(
+ getDataSourceConfiguration(),
+ getQuery(),
+ getParameterSetter(),
+ getRowMapper(),
+ getFetchSize())))
+ .setCoder(getCoder());
+
+ if (getOutputParallelization()) {
+ output = output.apply(new Reparallelize<>());
+ }
+
+ return output;
}
@Override
diff --git a/sdks/java/io/mqtt/src/main/java/org/apache/beam/sdk/io/mqtt/MqttIO.java b/sdks/java/io/mqtt/src/main/java/org/apache/beam/sdk/io/mqtt/MqttIO.java
index 111789b..dc7cde3 100644
--- a/sdks/java/io/mqtt/src/main/java/org/apache/beam/sdk/io/mqtt/MqttIO.java
+++ b/sdks/java/io/mqtt/src/main/java/org/apache/beam/sdk/io/mqtt/MqttIO.java
@@ -121,10 +121,8 @@
@AutoValue
public abstract static class ConnectionConfiguration implements Serializable {
- @Nullable
abstract String getServerUri();
- @Nullable
abstract String getTopic();
@Nullable
@@ -177,23 +175,40 @@
* @param topic The MQTT getTopic pattern.
* @param clientId A client ID prefix, used to construct an unique client ID.
* @return A connection configuration to the MQTT broker.
+ * @deprecated This constructor will be removed in a future version of Beam, please use
+ * #create(String, String)} and {@link #withClientId(String)} instead.
*/
+ @Deprecated
public static ConnectionConfiguration create(String serverUri, String topic, String clientId) {
- checkArgument(serverUri != null, "serverUri can not be null");
- checkArgument(topic != null, "topic can not be null");
checkArgument(clientId != null, "clientId can not be null");
- return new AutoValue_MqttIO_ConnectionConfiguration.Builder()
- .setServerUri(serverUri)
- .setTopic(topic)
- .setClientId(clientId)
- .build();
+ return create(serverUri, topic).withClientId(clientId);
+ }
+
+ /** Set up the MQTT broker URI. */
+ public ConnectionConfiguration withServerUri(String serverUri) {
+ checkArgument(serverUri != null, "serverUri can not be null");
+ return builder().setServerUri(serverUri).build();
+ }
+
+ /** Set up the MQTT getTopic pattern. */
+ public ConnectionConfiguration withTopic(String topic) {
+ checkArgument(topic != null, "topic can not be null");
+ return builder().setTopic(topic).build();
+ }
+
+ /** Set up the client ID prefix, which is used to construct an unique client ID. */
+ public ConnectionConfiguration withClientId(String clientId) {
+ checkArgument(clientId != null, "clientId can not be null");
+ return builder().setClientId(clientId).build();
}
public ConnectionConfiguration withUsername(String username) {
+ checkArgument(username != null, "username can not be null");
return builder().setUsername(username).build();
}
public ConnectionConfiguration withPassword(String password) {
+ checkArgument(password != null, "password can not be null");
return builder().setPassword(password).build();
}
diff --git a/sdks/java/io/mqtt/src/test/java/org/apache/beam/sdk/io/mqtt/MqttIOTest.java b/sdks/java/io/mqtt/src/test/java/org/apache/beam/sdk/io/mqtt/MqttIOTest.java
index b4fd2da..58ef330 100644
--- a/sdks/java/io/mqtt/src/test/java/org/apache/beam/sdk/io/mqtt/MqttIOTest.java
+++ b/sdks/java/io/mqtt/src/test/java/org/apache/beam/sdk/io/mqtt/MqttIOTest.java
@@ -150,8 +150,8 @@
pipeline.apply(
MqttIO.read()
.withConnectionConfiguration(
- MqttIO.ConnectionConfiguration.create(
- "tcp://localhost:" + port, "READ_TOPIC", "READ_PIPELINE"))
+ MqttIO.ConnectionConfiguration.create("tcp://localhost:" + port, "READ_TOPIC")
+ .withClientId("READ_PIPELINE"))
.withMaxReadTime(Duration.standardSeconds(3)));
PAssert.that(output)
.containsInAnyOrder(
@@ -212,8 +212,8 @@
pipeline.apply(
MqttIO.read()
.withConnectionConfiguration(
- MqttIO.ConnectionConfiguration.create(
- "tcp://localhost:" + port, "READ_TOPIC", "READ_PIPELINE"))
+ MqttIO.ConnectionConfiguration.create("tcp://localhost:" + port, "READ_TOPIC")
+ .withClientId("READ_PIPELINE"))
.withMaxReadTime(Duration.standardSeconds(2)));
// should stop before the test timeout
diff --git a/sdks/java/io/redis/src/main/java/org/apache/beam/sdk/io/redis/RedisConnectionConfiguration.java b/sdks/java/io/redis/src/main/java/org/apache/beam/sdk/io/redis/RedisConnectionConfiguration.java
index 9090baa..6f5ff52 100644
--- a/sdks/java/io/redis/src/main/java/org/apache/beam/sdk/io/redis/RedisConnectionConfiguration.java
+++ b/sdks/java/io/redis/src/main/java/org/apache/beam/sdk/io/redis/RedisConnectionConfiguration.java
@@ -42,6 +42,8 @@
abstract int timeout();
+ abstract boolean ssl();
+
abstract Builder builder();
@AutoValue.Builder
@@ -54,6 +56,8 @@
abstract Builder setTimeout(int timeout);
+ abstract Builder setSsl(boolean ssl);
+
abstract RedisConnectionConfiguration build();
}
@@ -62,6 +66,7 @@
.setHost(Protocol.DEFAULT_HOST)
.setPort(Protocol.DEFAULT_PORT)
.setTimeout(Protocol.DEFAULT_TIMEOUT)
+ .setSsl(false)
.build();
}
@@ -70,6 +75,7 @@
.setHost(host)
.setPort(port)
.setTimeout(Protocol.DEFAULT_TIMEOUT)
+ .setSsl(false)
.build();
}
@@ -99,9 +105,14 @@
return builder().setTimeout(timeout).build();
}
+ /** Enable SSL connection to Redis server. */
+ public RedisConnectionConfiguration enableSSL() {
+ return builder().setSsl(true).build();
+ }
+
/** Connect to the Redis instance. */
public Jedis connect() {
- Jedis jedis = new Jedis(host(), port(), timeout());
+ Jedis jedis = new Jedis(host(), port(), timeout(), ssl());
if (auth() != null) {
jedis.auth(auth());
}
@@ -113,5 +124,6 @@
builder.add(DisplayData.item("host", host()));
builder.add(DisplayData.item("port", port()));
builder.addIfNotNull(DisplayData.item("timeout", timeout()));
+ builder.add(DisplayData.item("ssl", ssl()));
}
}
diff --git a/sdks/java/io/redis/src/test/java/org/apache/beam/sdk/io/redis/RedisIOTest.java b/sdks/java/io/redis/src/test/java/org/apache/beam/sdk/io/redis/RedisIOTest.java
index c33355c..585c238 100644
--- a/sdks/java/io/redis/src/test/java/org/apache/beam/sdk/io/redis/RedisIOTest.java
+++ b/sdks/java/io/redis/src/test/java/org/apache/beam/sdk/io/redis/RedisIOTest.java
@@ -249,6 +249,7 @@
Assert.assertEquals(111, read.connectionConfiguration().port());
Assert.assertEquals("pass", read.connectionConfiguration().auth());
Assert.assertEquals(5, read.connectionConfiguration().timeout());
+ Assert.assertEquals(false, read.connectionConfiguration().ssl());
}
@Test
@@ -258,6 +259,7 @@
Assert.assertEquals(111, write.connectionConfiguration().port());
Assert.assertEquals("pass", write.connectionConfiguration().auth());
Assert.assertEquals(5, write.connectionConfiguration().timeout());
+ Assert.assertEquals(false, write.connectionConfiguration().ssl());
Assert.assertEquals(Method.APPEND, write.method());
}
diff --git a/sdks/java/javadoc/ant.xml b/sdks/java/javadoc/ant.xml
deleted file mode 100644
index e30d604..0000000
--- a/sdks/java/javadoc/ant.xml
+++ /dev/null
@@ -1,100 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!--
- Licensed to the Apache Software Foundation (ASF) under one or more
- contributor license agreements. See the NOTICE file distributed with
- this work for additional information regarding copyright ownership.
- The ASF licenses this file to You under the Apache License, Version 2.0
- (the "License"); you may not use this file except in compliance with
- the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
- -->
-<!--
- Ant tasks used to create Javadoc across all of Beam.
--->
-<project>
-
- <property name="collect.dir" location="target/collected"/>
- <property name="output.dir" location="../target/site/javadoc"/>
-
- <!-- One target to do the work. It collects the sources and runs javadoc. -->
- <target name="javadoc">
- <!-- suck up all the java code into one place. -->
- <copy todir="${collect.dir}">
- <fileset dir="..">
- <include name="**/src/main/java/**/*.java"/>
- <exclude name="**/maven-archetypes/**"/>
- <exclude name="**/nexmark/**"/>
- </fileset>
- <!-- For each pathname, turn X/src/main/java/Y to Y. This
- results in one Java source tree. -->
- <mapper type="regexp" from="^.*/src/main/java/(.*)$$" to="\1"/>
- </copy>
-
- <copy todir="${collect.dir}">
- <fileset dir="../../../runners">
- <include name="**/src/main/java/**/*.java"/>
- </fileset>
- <!-- For each pathname, turn X/src/main/java/Y to Y. This
- results in one Java source tree. -->
- <mapper type="regexp" from="^.*/src/main/java/(.*)$$" to="\1"/>
- </copy>
-
- <!-- Run javadoc. -->
- <javadoc sourcepath="${collect.dir}"
- destdir="${output.dir}"
- classpath="${compile_classpath}"
- Overview="overview.html"
- >
- <!-- Get some Javadoc options set up by Maven. -->
- <arg line="${beam.javadoc_opts}"/>
- <!-- Setup offline links for some packages. -->
- <link href="http://avro.apache.org/docs/1.7.7/api/java/"
- offline="true" packageListLoc="avro-docs"/>
- <link
- href="https://developers.google.com/api-client-library/java/google-api-java-client/reference/1.22.0/"
- offline="true" packageListLoc="apiclient-docs"/>
- <link
- href="https://developers.google.com/api-client-library/java/google-oauth-java-client/reference/1.22.0/"
- offline="true" packageListLoc="oauth-docs"/>
- <link
- href="https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/"
- offline="true" packageListLoc="bq-docs"/>
- <link href="http://googlecloudplatform.github.io/google-cloud-java/0.8.0/apidocs/index.html"
- offline="true" packageListLoc="datastore-docs"/>
- <link href="http://google.github.io/guava/releases/20.0/api/docs/"
- offline="true" packageListLoc="guava-docs"/>
- <link href="http://fasterxml.github.io/jackson-annotations/javadoc/2.7/"
- offline="true" packageListLoc="jackson-annotations-docs"/>
- <link href="http://fasterxml.github.io/jackson-databind/javadoc/2.7/"
- offline="true" packageListLoc="jackson-databind-docs"/>
- <link href="http://hamcrest.org/JavaHamcrest/javadoc/1.3/"
- offline="true" packageListLoc="hamcrest-docs"/>
- <link href="http://www.joda.org/joda-time/apidocs"
- offline="true" packageListLoc="joda-docs"/>
- <link href="http://junit.sourceforge.net/javadoc/"
- offline="true" packageListLoc="junit-docs"/>
-
- <excludepackage name="org.apache.beam.examples.*"/>
- <excludepackage name="org.apache.beam.fn.*"/>
- <excludepackage name="org.apache.beam.runners.apex.translation.*"/>
- <excludepackage name="org.apache.beam.runners.core.*"/>
- <excludepackage name="org.apache.beam.runners.dataflow.internal.*"/>
- <excludepackage name="org.apache.beam.runners.flink.examples.*"/>
- <excludepackage name="org.apache.beam.runners.flink.translation.*"/>
- <excludepackage name="org.apache.beam.runners.spark.examples.*"/>
- <excludepackage name="org.apache.beam.runners.spark.translation.*"/>
- <excludepackage name="org.apache.beam.sdk.transforms.reflect.*"/>
- <excludepackage name="org.apache.beam.sdk.runners.*"/>
- <excludepackage name="org.apache.beam.sdk.util.*"/>
- </javadoc>
- <!-- make a jar file to push to central -->
- <jar destfile="${output.jar}" basedir="${output.dir}"/>
- </target>
-</project>
diff --git a/sdks/java/javadoc/build.gradle b/sdks/java/javadoc/build.gradle
index 986b0c1..d32cc44 100644
--- a/sdks/java/javadoc/build.gradle
+++ b/sdks/java/javadoc/build.gradle
@@ -23,7 +23,8 @@
* Generated files will be located under beam/sdks/java/javadoc/build/docs/javadoc and are
* used as part of the beam-site source tree.
*/
-plugins { id 'java' }
+plugins { id 'org.apache.beam.module' }
+applyJavaNature()
description = "Apache Beam :: SDKs :: Java :: Aggregated Javadoc"
for (p in rootProject.subprojects) {
@@ -74,8 +75,6 @@
failOnError false
title "Apache Beam " + project.version
overview 'overview.html'
- addStringOption('Xdoclint:all', '-quiet')
- addStringOption('Xdoclint:-missing', '-quiet')
linksOffline 'http://avro.apache.org/docs/1.7.7/api/java/', 'avro-docs'
linksOffline 'https://developers.google.com/api-client-library/java/google-api-java-client/reference/1.22.0/', 'apiclient-docs'
linksOffline 'https://developers.google.com/api-client-library/java/google-oauth-java-client/reference/1.22.0/', 'oauth-docs'
diff --git a/sdks/python/apache_beam/coders/standard_coders_test.py b/sdks/python/apache_beam/coders/standard_coders_test.py
index cb9d43b..79c06f4 100644
--- a/sdks/python/apache_beam/coders/standard_coders_test.py
+++ b/sdks/python/apache_beam/coders/standard_coders_test.py
@@ -30,7 +30,8 @@
import yaml
from apache_beam.coders import coder_impl
-from apache_beam.coders import coders
+from apache_beam.portability.api import beam_runner_api_pb2
+from apache_beam.runners import pipeline_context
from apache_beam.transforms import window
from apache_beam.transforms.window import IntervalWindow
from apache_beam.utils import windowed_value
@@ -55,18 +56,6 @@
class StandardCodersTest(unittest.TestCase):
- _urn_to_coder_class = {
- 'beam:coder:bytes:v1': coders.BytesCoder,
- 'beam:coder:varint:v1': coders.VarIntCoder,
- 'beam:coder:kv:v1': lambda k, v: coders.TupleCoder((k, v)),
- 'beam:coder:interval_window:v1': coders.IntervalWindowCoder,
- 'beam:coder:iterable:v1': lambda t: coders.IterableCoder(t),
- 'beam:coder:global_window:v1': coders.GlobalWindowCoder,
- 'beam:coder:windowed_value:v1':
- lambda v, w: coders.WindowedValueCoder(v, w),
- 'beam:coder:timer:v1': coders._TimerCoder,
- }
-
_urn_to_json_value_parser = {
'beam:coder:bytes:v1': lambda x: x.encode('utf-8'),
'beam:coder:varint:v1': lambda x: x,
@@ -87,6 +76,7 @@
lambda x, payload_parser: dict(
payload=payload_parser(x['payload']),
timestamp=Timestamp(micros=x['timestamp'])),
+ 'beam:coder:double:v1': lambda x: float(x),
}
def test_standard_coders(self):
@@ -95,6 +85,16 @@
self._run_standard_coder(name, spec)
def _run_standard_coder(self, name, spec):
+ def assert_equal(actual, expected):
+ """Handle nan values which self.assertEqual fails on."""
+ import math
+ if (isinstance(actual, float)
+ and isinstance(expected, float)
+ and math.isnan(actual)
+ and math.isnan(expected)):
+ return
+ self.assertEqual(actual, expected)
+
coder = self.parse_coder(spec['coder'])
parse_value = self.json_value_parser(spec['coder'])
nested_list = [spec['nested']] if 'nested' in spec else [True, False]
@@ -108,16 +108,24 @@
self.to_fix[spec['index'], expected_encoded] = actual_encoded
else:
self.assertEqual(expected_encoded, actual_encoded)
- self.assertEqual(decode_nested(coder, expected_encoded, nested),
- value)
+ decoded = decode_nested(coder, expected_encoded, nested)
+ assert_equal(decoded, value)
else:
# Only verify decoding for a non-deterministic coder
self.assertEqual(decode_nested(coder, expected_encoded, nested),
value)
def parse_coder(self, spec):
- return self._urn_to_coder_class[spec['urn']](
- *[self.parse_coder(c) for c in spec.get('components', ())])
+ context = pipeline_context.PipelineContext()
+ coder_id = str(hash(str(spec)))
+ component_ids = [context.coders.get_id(self.parse_coder(c))
+ for c in spec.get('components', ())]
+ context.coders.put_proto(coder_id, beam_runner_api_pb2.Coder(
+ spec=beam_runner_api_pb2.SdkFunctionSpec(
+ spec=beam_runner_api_pb2.FunctionSpec(
+ urn=spec['urn'], payload=spec.get('payload'))),
+ component_coder_ids=component_ids))
+ return context.coders.get_by_id(coder_id)
def json_value_parser(self, coder_spec):
component_parsers = [
diff --git a/sdks/python/apache_beam/examples/complete/game/game_stats.py b/sdks/python/apache_beam/examples/complete/game/game_stats.py
index 5f7a0ed..9b5cc32 100644
--- a/sdks/python/apache_beam/examples/complete/game/game_stats.py
+++ b/sdks/python/apache_beam/examples/complete/game/game_stats.py
@@ -306,6 +306,7 @@
topic=args.topic)
raw_events = (
scores
+ | 'DecodeString' >> beam.Map(lambda b: b.decode('utf-8'))
| 'ParseGameEventFn' >> beam.ParDo(ParseGameEventFn())
| 'AddEventTimestamps' >> beam.Map(
lambda elem: beam.window.TimestampedValue(elem, elem['timestamp'])))
diff --git a/sdks/python/apache_beam/examples/complete/game/game_stats_it_test.py b/sdks/python/apache_beam/examples/complete/game/game_stats_it_test.py
index e7b89aa..cba4b00 100644
--- a/sdks/python/apache_beam/examples/complete/game/game_stats_it_test.py
+++ b/sdks/python/apache_beam/examples/complete/game/game_stats_it_test.py
@@ -33,8 +33,6 @@
from __future__ import absolute_import
import logging
-import os
-import sys
import time
import unittest
import uuid
@@ -51,10 +49,6 @@
from apache_beam.testing.test_pipeline import TestPipeline
-@unittest.skipIf(sys.version_info[0] == 3 and
- os.environ.get('RUN_SKIPPED_PY3_TESTS') != '1',
- 'This test still needs to be fixed on Python 3'
- 'TODO: BEAM-6711')
class GameStatsIT(unittest.TestCase):
# Input events containing user, team, score, processing time, window start.
@@ -102,7 +96,8 @@
for _ in range(message_count):
self.pub_client.publish(topic.name,
- self.INPUT_EVENT % self._test_timestamp)
+ (self.INPUT_EVENT % self._test_timestamp
+ ).encode('utf-8'))
def _cleanup_pubsub(self):
test_utils.cleanup_subscriptions(self.sub_client, [self.input_sub])
diff --git a/sdks/python/apache_beam/examples/complete/game/hourly_team_score_it_test.py b/sdks/python/apache_beam/examples/complete/game/hourly_team_score_it_test.py
index 2fce1fc..5685132 100644
--- a/sdks/python/apache_beam/examples/complete/game/hourly_team_score_it_test.py
+++ b/sdks/python/apache_beam/examples/complete/game/hourly_team_score_it_test.py
@@ -33,8 +33,6 @@
from __future__ import absolute_import
import logging
-import os
-import sys
import unittest
from hamcrest.core.core.allof import all_of
@@ -48,10 +46,6 @@
from apache_beam.testing.test_pipeline import TestPipeline
-@unittest.skipIf(sys.version_info[0] == 3 and
- os.environ.get('RUN_SKIPPED_PY3_TESTS') != '1',
- 'This test still needs to be fixed on Python 3'
- 'TODO: BEAM-6870')
class HourlyTeamScoreIT(unittest.TestCase):
DEFAULT_INPUT_FILE = 'gs://dataflow-samples/game/gaming_data*'
diff --git a/sdks/python/apache_beam/examples/complete/game/leader_board.py b/sdks/python/apache_beam/examples/complete/game/leader_board.py
index cde1544..43a599b 100644
--- a/sdks/python/apache_beam/examples/complete/game/leader_board.py
+++ b/sdks/python/apache_beam/examples/complete/game/leader_board.py
@@ -315,6 +315,7 @@
events = (
scores
+ | 'DecodeString' >> beam.Map(lambda b: b.decode('utf-8'))
| 'ParseGameEventFn' >> beam.ParDo(ParseGameEventFn())
| 'AddEventTimestamps' >> beam.Map(
lambda elem: beam.window.TimestampedValue(elem, elem['timestamp'])))
diff --git a/sdks/python/apache_beam/examples/complete/game/leader_board_it_test.py b/sdks/python/apache_beam/examples/complete/game/leader_board_it_test.py
index b86e49e..9f057fd 100644
--- a/sdks/python/apache_beam/examples/complete/game/leader_board_it_test.py
+++ b/sdks/python/apache_beam/examples/complete/game/leader_board_it_test.py
@@ -33,8 +33,6 @@
from __future__ import absolute_import
import logging
-import os
-import sys
import time
import unittest
import uuid
@@ -52,10 +50,6 @@
from apache_beam.testing.test_pipeline import TestPipeline
-@unittest.skipIf(sys.version_info[0] == 3 and
- os.environ.get('RUN_SKIPPED_PY3_TESTS') != '1',
- 'This test still needs to be fixed on Python 3'
- 'TODO: BEAM-6711')
class LeaderBoardIT(unittest.TestCase):
# Input event containing user, team, score, processing time, window start.
@@ -104,7 +98,8 @@
for _ in range(message_count):
self.pub_client.publish(topic.name,
- self.INPUT_EVENT % self._test_timestamp)
+ (self.INPUT_EVENT % self._test_timestamp
+ ).encode('utf-8'))
def _cleanup_pubsub(self):
test_utils.cleanup_subscriptions(self.sub_client, [self.input_sub])
diff --git a/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py b/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py
index 4bc51d1..a3511ff 100644
--- a/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py
+++ b/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py
@@ -42,7 +42,6 @@
from apache_beam import pvalue
from apache_beam.io import filesystems as fs
from apache_beam.io.gcp import bigquery_tools
-from apache_beam.io.gcp.internal.clients import bigquery as bigquery_api
from apache_beam.options import value_provider as vp
from apache_beam.options.pipeline_options import GoogleCloudOptions
@@ -90,9 +89,12 @@
def _make_new_file_writer(file_prefix, destination):
- if isinstance(destination, bigquery_api.TableReference):
- destination = '%s:%s.%s' % (
- destination.projectId, destination.datasetId, destination.tableId)
+ destination = bigquery_tools.get_hashable_destination(destination)
+
+ # Windows does not allow : on filenames. Replacing with underscore.
+ # Other disallowed characters are:
+ # https://docs.microsoft.com/en-us/windows/desktop/fileio/naming-a-file
+ destination = destination.replace(':', '.')
directory = fs.FileSystems.join(file_prefix, destination)
diff --git a/sdks/python/apache_beam/io/gcp/bigquery_tools.py b/sdks/python/apache_beam/io/gcp/bigquery_tools.py
index 8a79077..d22e0eb 100644
--- a/sdks/python/apache_beam/io/gcp/bigquery_tools.py
+++ b/sdks/python/apache_beam/io/gcp/bigquery_tools.py
@@ -366,16 +366,16 @@
jobReference=reference))
response = self.client.jobs.Insert(request)
- return response.jobReference.jobId
+ return response.jobReference.jobId, response.jobReference.location
@retry.with_exponential_backoff(
num_retries=MAX_RETRIES,
retry_filter=retry.retry_on_server_errors_and_timeout_filter)
def _get_query_results(self, project_id, job_id,
- page_token=None, max_results=10000):
+ page_token=None, max_results=10000, location=None):
request = bigquery.BigqueryJobsGetQueryResultsRequest(
jobId=job_id, pageToken=page_token, projectId=project_id,
- maxResults=max_results)
+ maxResults=max_results, location=location)
response = self.client.jobs.GetQueryResults(request)
return response
@@ -668,16 +668,18 @@
def run_query(self, project_id, query, use_legacy_sql, flatten_results,
dry_run=False):
- job_id = self._start_query_job(project_id, query, use_legacy_sql,
- flatten_results, job_id=uuid.uuid4().hex,
- dry_run=dry_run)
+ job_id, location = self._start_query_job(project_id, query,
+ use_legacy_sql, flatten_results,
+ job_id=uuid.uuid4().hex,
+ dry_run=dry_run)
if dry_run:
# If this was a dry run then the fact that we get here means the
# query has no errors. The start_query_job would raise an error otherwise.
return
page_token = None
while True:
- response = self._get_query_results(project_id, job_id, page_token)
+ response = self._get_query_results(project_id, job_id,
+ page_token, location=location)
if not response.jobComplete:
# The jobComplete field can be False if the query request times out
# (default is 10 seconds). Note that this is a timeout for the query
diff --git a/sdks/python/apache_beam/io/restriction_trackers.py b/sdks/python/apache_beam/io/restriction_trackers.py
index 3933ae5..fe60a5e 100644
--- a/sdks/python/apache_beam/io/restriction_trackers.py
+++ b/sdks/python/apache_beam/io/restriction_trackers.py
@@ -124,6 +124,9 @@
with self._lock:
return self._range.stop
+ def default_size(self):
+ return self._range.stop - self._range.start
+
def try_claim(self, position):
with self._lock:
if self._last_claim_attempt and position <= self._last_claim_attempt:
diff --git a/sdks/python/apache_beam/metrics/monitoring_infos.py b/sdks/python/apache_beam/metrics/monitoring_infos.py
index 855a73c..aa4615d 100644
--- a/sdks/python/apache_beam/metrics/monitoring_infos.py
+++ b/sdks/python/apache_beam/metrics/monitoring_infos.py
@@ -35,15 +35,18 @@
from apache_beam.portability.api.metrics_pb2 import Metric
from apache_beam.portability.api.metrics_pb2 import MonitoringInfo
-ELEMENT_COUNT_URN = common_urns.monitoring_infos.ELEMENT_COUNT.urn
-START_BUNDLE_MSECS_URN = common_urns.monitoring_infos.START_BUNDLE_MSECS.urn
-PROCESS_BUNDLE_MSECS_URN = common_urns.monitoring_infos.PROCESS_BUNDLE_MSECS.urn
-FINISH_BUNDLE_MSECS_URN = common_urns.monitoring_infos.FINISH_BUNDLE_MSECS.urn
-TOTAL_MSECS_URN = common_urns.monitoring_infos.TOTAL_MSECS.urn
+ELEMENT_COUNT_URN = common_urns.monitoring_info_specs.ELEMENT_COUNT.spec.urn
+START_BUNDLE_MSECS_URN = (
+ common_urns.monitoring_info_specs.START_BUNDLE_MSECS.spec.urn)
+PROCESS_BUNDLE_MSECS_URN = (
+ common_urns.monitoring_info_specs.PROCESS_BUNDLE_MSECS.spec.urn)
+FINISH_BUNDLE_MSECS_URN = (
+ common_urns.monitoring_info_specs.FINISH_BUNDLE_MSECS.spec.urn)
+TOTAL_MSECS_URN = common_urns.monitoring_info_specs.TOTAL_MSECS.spec.urn
USER_COUNTER_URN_PREFIX = (
- common_urns.monitoring_infos.USER_COUNTER_URN_PREFIX.urn)
+ common_urns.monitoring_info_specs.USER_COUNTER.spec.urn)
USER_DISTRIBUTION_COUNTER_URN_PREFIX = (
- common_urns.monitoring_infos.USER_DISTRIBUTION_COUNTER_URN_PREFIX.urn)
+ common_urns.monitoring_info_specs.USER_DISTRIBUTION_COUNTER.spec.urn)
# TODO(ajamato): Implement the remaining types, i.e. Double types
# Extrema types, etc. See:
diff --git a/sdks/python/apache_beam/portability/common_urns.py b/sdks/python/apache_beam/portability/common_urns.py
index edbb681..693133b 100644
--- a/sdks/python/apache_beam/portability/common_urns.py
+++ b/sdks/python/apache_beam/portability/common_urns.py
@@ -28,11 +28,12 @@
class PropertiesFromEnumValue(object):
def __init__(self, value_descriptor):
- self.urn = (
- value_descriptor.GetOptions().Extensions[beam_runner_api_pb2.beam_urn])
- self.constant = (
- value_descriptor.GetOptions().Extensions[
- beam_runner_api_pb2.beam_constant])
+ self.urn = (value_descriptor.GetOptions().Extensions[
+ beam_runner_api_pb2.beam_urn])
+ self.constant = (value_descriptor.GetOptions().Extensions[
+ beam_runner_api_pb2.beam_constant])
+ self.spec = (value_descriptor.GetOptions().Extensions[
+ metrics_pb2.monitoring_info_spec])
class PropertiesFromEnumType(object):
@@ -77,7 +78,7 @@
session_windows = PropertiesFromPayloadType(
standard_window_fns_pb2.SessionsPayload)
-monitoring_infos = PropertiesFromEnumType(
- metrics_pb2.MonitoringInfoUrns.Enum)
+monitoring_info_specs = PropertiesFromEnumType(
+ metrics_pb2.MonitoringInfoSpecs.Enum)
monitoring_info_types = PropertiesFromEnumType(
metrics_pb2.MonitoringInfoTypeUrns.Enum)
diff --git a/sdks/python/apache_beam/runners/common.py b/sdks/python/apache_beam/runners/common.py
index 3438790..f1fda35 100644
--- a/sdks/python/apache_beam/runners/common.py
+++ b/sdks/python/apache_beam/runners/common.py
@@ -638,9 +638,11 @@
deferred_status = self.restriction_tracker.deferred_status()
if deferred_status:
deferred_restriction, deferred_watermark = deferred_status
+ element = windowed_value.value
+ size = self.signature.get_restriction_provider().restriction_size(
+ element, deferred_restriction)
return (
- windowed_value.with_value(
- (windowed_value.value, deferred_restriction)),
+ windowed_value.with_value(((element, deferred_restriction), size)),
deferred_watermark)
def try_split(self, fraction):
@@ -651,10 +653,15 @@
if split:
primary, residual = split
element = self.current_windowed_value.value
+ restriction_provider = self.signature.get_restriction_provider()
+ primary_size = restriction_provider.restriction_size(element, primary)
+ residual_size = restriction_provider.restriction_size(element, residual)
return (
- (self.current_windowed_value.with_value((element, primary)),
+ (self.current_windowed_value.with_value(
+ ((element, primary), primary_size)),
None),
- (self.current_windowed_value.with_value((element, residual)),
+ (self.current_windowed_value.with_value(
+ ((element, residual), residual_size)),
restriction_tracker.current_watermark()))
def current_element_progress(self):
@@ -745,11 +752,8 @@
except BaseException as exn:
self._reraise_augmented(exn)
- def finalize(self):
- self.bundle_finalizer_param.finalize_bundle()
-
- def process_with_restriction(self, windowed_value):
- element, restriction = windowed_value.value
+ def process_with_sized_restriction(self, windowed_value):
+ (element, restriction), _ = windowed_value.value
return self.do_fn_invoker.invoke_process(
windowed_value.with_value(element),
restriction_tracker=self.do_fn_invoker.invoke_create_tracker(
@@ -780,6 +784,9 @@
def finish(self):
self._invoke_bundle_method(self.do_fn_invoker.invoke_finish_bundle)
+ def finalize(self):
+ self.bundle_finalizer_param.finalize_bundle()
+
def _reraise_augmented(self, exn):
if getattr(exn, '_tagged_with_step', False) or not self.step_name:
raise
diff --git a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
index b4630af..e140db5 100644
--- a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
+++ b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
@@ -717,11 +717,13 @@
from apache_beam.runners.dataflow.internal import apiclient
transform_proto = self.proto_context.transforms.get_proto(transform_node)
transform_id = self.proto_context.transforms.get_id(transform_node)
+ use_fnapi = apiclient._use_fnapi(options)
+ use_unified_worker = apiclient._use_unified_worker(options)
# The data transmitted in SERIALIZED_FN is different depending on whether
# this is a fnapi pipeline or not.
- if (apiclient._use_fnapi(options) and
+ if (use_fnapi and
(transform_proto.spec.urn == common_urns.primitives.PAR_DO.urn or
- apiclient._use_unified_worker(options))):
+ use_unified_worker)):
# Patch side input ids to be unique across a given pipeline.
if (label_renames and
transform_proto.spec.urn == common_urns.primitives.PAR_DO.urn):
@@ -781,6 +783,16 @@
step.add_property(PropertyNames.OUTPUT_INFO, outputs)
+ # Add the restriction encoding if we are a splittable DoFn
+ # and are using the Fn API on the unified worker.
+ from apache_beam.runners.common import DoFnSignature
+ signature = DoFnSignature(transform_node.transform.fn)
+ if (use_fnapi and use_unified_worker and signature.is_splittable_dofn()):
+ restriction_coder = (
+ signature.get_restriction_provider().restriction_coder())
+ step.add_property(PropertyNames.RESTRICTION_ENCODING,
+ self._get_cloud_encoding(restriction_coder, use_fnapi))
+
@staticmethod
def _pardo_fn_data(transform_node, get_label):
transform = transform_node.transform
diff --git a/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py b/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
index b03a8bb..e0a1a56 100644
--- a/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
+++ b/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
@@ -31,7 +31,6 @@
import sys
import tempfile
import time
-import warnings
from datetime import datetime
from builtins import object
@@ -954,15 +953,7 @@
def _verify_interpreter_version_is_supported(pipeline_options):
- if sys.version_info[0] == 2:
- return
-
- if sys.version_info[0:2] in [(3, 5), (3, 6)]:
- if sys.version_info[0:3] < (3, 5, 3):
- warnings.warn(
- 'You are using an early release for Python 3.5. It is recommended '
- 'to use Python 3.5.3 or higher with Dataflow '
- 'runner.')
+ if sys.version_info[0:2] in [(2, 7), (3, 5), (3, 6), (3, 7)]:
return
debug_options = pipeline_options.view_as(DebugOptions)
@@ -972,8 +963,8 @@
raise Exception(
'Dataflow runner currently supports Python versions '
- '2.7, 3.5 and 3.6. To ignore this requirement and start a job using a '
- 'different version of Python 3 interpreter, pass '
+ '2.7, 3.5, 3.6, and 3.7. To ignore this requirement and start a job '
+ 'using a different version of Python 3 interpreter, pass '
'--experiment ignore_py3_minor_version pipeline option.')
diff --git a/sdks/python/apache_beam/runners/dataflow/internal/apiclient_test.py b/sdks/python/apache_beam/runners/dataflow/internal/apiclient_test.py
index 9e11035..77eba7c 100644
--- a/sdks/python/apache_beam/runners/dataflow/internal/apiclient_test.py
+++ b/sdks/python/apache_beam/runners/dataflow/internal/apiclient_test.py
@@ -286,26 +286,23 @@
pipeline_options,
'2.0.0', #any environment version
FAKE_PIPELINE_URL)
- if sys.version_info[0:2] == (3, 5):
- self.assertEqual(
- env.proto.workerPools[0].workerHarnessContainerImage,
- (names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY +
- '/python3-fnapi:' + names.BEAM_FNAPI_CONTAINER_VERSION))
- elif sys.version_info[0:2] == (3, 6):
- self.assertEqual(
- env.proto.workerPools[0].workerHarnessContainerImage,
- (names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY +
- '/python36-fnapi:' + names.BEAM_FNAPI_CONTAINER_VERSION))
- elif sys.version_info[0:2] == (3, 7):
- self.assertEqual(
- env.proto.workerPools[0].workerHarnessContainerImage,
- (names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY +
- '/python37-fnapi:' + names.BEAM_FNAPI_CONTAINER_VERSION))
- else:
+ if sys.version_info[0] == 2:
self.assertEqual(
env.proto.workerPools[0].workerHarnessContainerImage,
(names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY +
'/python-fnapi:' + names.BEAM_FNAPI_CONTAINER_VERSION))
+ elif sys.version_info[0:2] == (3, 5):
+ self.assertEqual(
+ env.proto.workerPools[0].workerHarnessContainerImage,
+ (names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY +
+ '/python3-fnapi:' + names.BEAM_FNAPI_CONTAINER_VERSION))
+ else:
+ self.assertEqual(
+ env.proto.workerPools[0].workerHarnessContainerImage,
+ (names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY +
+ '/python%d%d-fnapi:%s' % (sys.version_info[0],
+ sys.version_info[1],
+ names.BEAM_FNAPI_CONTAINER_VERSION)))
# batch, legacy pipeline.
pipeline_options = PipelineOptions(
@@ -314,26 +311,23 @@
pipeline_options,
'2.0.0', #any environment version
FAKE_PIPELINE_URL)
- if sys.version_info[0:2] == (3, 5):
- self.assertEqual(
- env.proto.workerPools[0].workerHarnessContainerImage,
- (names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY +
- '/python3:' + names.BEAM_CONTAINER_VERSION))
- elif sys.version_info[0:2] == (3, 6):
- self.assertEqual(
- env.proto.workerPools[0].workerHarnessContainerImage,
- (names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY +
- '/python36:' + names.BEAM_CONTAINER_VERSION))
- elif sys.version_info[0:2] == (3, 7):
- self.assertEqual(
- env.proto.workerPools[0].workerHarnessContainerImage,
- (names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY +
- '/python37:' + names.BEAM_CONTAINER_VERSION))
- else:
+ if sys.version_info[0] == 2:
self.assertEqual(
env.proto.workerPools[0].workerHarnessContainerImage,
(names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY +
'/python:' + names.BEAM_CONTAINER_VERSION))
+ elif sys.version_info[0:2] == (3, 5):
+ self.assertEqual(
+ env.proto.workerPools[0].workerHarnessContainerImage,
+ (names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY +
+ '/python3:' + names.BEAM_CONTAINER_VERSION))
+ else:
+ self.assertEqual(
+ env.proto.workerPools[0].workerHarnessContainerImage,
+ (names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY +
+ '/python%d%d:%s' % (sys.version_info[0],
+ sys.version_info[1],
+ names.BEAM_CONTAINER_VERSION)))
@mock.patch('apache_beam.runners.dataflow.internal.apiclient.'
'beam_version.__version__', '2.2.0')
@@ -345,7 +339,12 @@
pipeline_options,
'2.0.0', #any environment version
FAKE_PIPELINE_URL)
- if sys.version_info[0] == 3:
+ if sys.version_info[0] == 2:
+ self.assertEqual(
+ env.proto.workerPools[0].workerHarnessContainerImage,
+ (names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY +
+ '/python-fnapi:2.2.0'))
+ elif sys.version_info[0:2] == (3, 5):
self.assertEqual(
env.proto.workerPools[0].workerHarnessContainerImage,
(names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY +
@@ -354,7 +353,8 @@
self.assertEqual(
env.proto.workerPools[0].workerHarnessContainerImage,
(names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY +
- '/python-fnapi:2.2.0'))
+ '/python%d%d-fnapi:2.2.0' % (sys.version_info[0],
+ sys.version_info[1])))
# batch, legacy pipeline.
pipeline_options = PipelineOptions(
@@ -363,7 +363,12 @@
pipeline_options,
'2.0.0', #any environment version
FAKE_PIPELINE_URL)
- if sys.version_info[0] == 3:
+ if sys.version_info[0] == 2:
+ self.assertEqual(
+ env.proto.workerPools[0].workerHarnessContainerImage,
+ (names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY +
+ '/python:2.2.0'))
+ elif sys.version_info[0:2] == (3, 5):
self.assertEqual(
env.proto.workerPools[0].workerHarnessContainerImage,
(names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY +
@@ -372,7 +377,8 @@
self.assertEqual(
env.proto.workerPools[0].workerHarnessContainerImage,
(names.DATAFLOW_CONTAINER_IMAGE_REPOSITORY +
- '/python:2.2.0'))
+ '/python%d%d:2.2.0' % (sys.version_info[0],
+ sys.version_info[1])))
def test_worker_harness_override_takes_precedence_over_sdk_defaults(self):
# streaming, fnapi pipeline.
@@ -456,7 +462,7 @@
@mock.patch(
'apache_beam.runners.dataflow.internal.apiclient.sys.version_info',
- (2, 333))
+ (3, 5))
def test_get_python_sdk_name(self):
pipeline_options = PipelineOptions(
['--project', 'test_project', '--job_name', 'test_job_name',
@@ -465,7 +471,7 @@
'--experiments', 'use_multiple_sdk_containers'])
environment = apiclient.Environment(
[], pipeline_options, 1, FAKE_PIPELINE_URL)
- self.assertEqual('Apache Beam Python 2.333 SDK',
+ self.assertEqual('Apache Beam Python 3.5 SDK',
environment._get_python_sdk_name())
@mock.patch(
diff --git a/sdks/python/apache_beam/runners/dataflow/internal/names.py b/sdks/python/apache_beam/runners/dataflow/internal/names.py
index 560bd3d..2fd4bba 100644
--- a/sdks/python/apache_beam/runners/dataflow/internal/names.py
+++ b/sdks/python/apache_beam/runners/dataflow/internal/names.py
@@ -109,6 +109,7 @@
PUBSUB_SUBSCRIPTION = 'pubsub_subscription'
PUBSUB_TIMESTAMP_ATTRIBUTE = 'pubsub_timestamp_label'
PUBSUB_TOPIC = 'pubsub_topic'
+ RESTRICTION_ENCODING = 'restriction_encoding'
SERIALIZED_FN = 'serialized_fn'
SHARD_NAME_TEMPLATE = 'shard_template'
SOURCE_STEP_INPUT = 'custom_source_step_input'
diff --git a/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py b/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
index 8a6f422..7bbc16d 100644
--- a/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
+++ b/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
@@ -379,6 +379,9 @@
middle = (end - start) // 2
return [(start, middle), (middle, end)]
+ def restriction_size(self, element, restriction):
+ return restriction[1] - restriction[0]
+
class ExpandStringsDoFn(beam.DoFn):
def process(self, element, restriction_tracker=ExpandStringsProvider()):
assert isinstance(
@@ -1035,6 +1038,9 @@
# Don't do any initial splitting to simplify test.
return [restriction]
+ def restriction_size(self, element, restriction):
+ return restriction[1] - restriction[0]
+
class EnumerateSdf(beam.DoFn):
def process(self, element, restriction_tracker=EnumerateProvider()):
to_emit = []
diff --git a/sdks/python/apache_beam/runners/portability/fn_api_runner_transforms.py b/sdks/python/apache_beam/runners/portability/fn_api_runner_transforms.py
index 8667a8e..f1aef40 100644
--- a/sdks/python/apache_beam/runners/portability/fn_api_runner_transforms.py
+++ b/sdks/python/apache_beam/runners/portability/fn_api_runner_transforms.py
@@ -51,6 +51,8 @@
common_urns.primitives.PAR_DO.urn,
common_urns.sdf_components.PAIR_WITH_RESTRICTION.urn,
common_urns.sdf_components.SPLIT_RESTRICTION.urn,
+ common_urns.sdf_components.SPLIT_AND_SIZE_RESTRICTIONS.urn,
+ common_urns.sdf_components.PROCESS_SIZED_ELEMENTS_AND_RESTRICTIONS.urn,
common_urns.sdf_components.PROCESS_ELEMENTS.urn])
IMPULSE_BUFFER = b'impulse'
@@ -740,6 +742,7 @@
main_input_id = transform.inputs[main_input_tag]
element_coder_id = context.components.pcollections[
main_input_id].coder_id
+ # KV[element, restriction]
paired_coder_id = context.add_or_get_coder_id(
beam_runner_api_pb2.Coder(
spec=beam_runner_api_pb2.SdkFunctionSpec(
@@ -747,6 +750,18 @@
urn=common_urns.coders.KV.urn)),
component_coder_ids=[element_coder_id,
pardo_payload.restriction_coder_id]))
+ # KV[KV[element, restriction], double]
+ sized_coder_id = context.add_or_get_coder_id(
+ beam_runner_api_pb2.Coder(
+ spec=beam_runner_api_pb2.SdkFunctionSpec(
+ spec=beam_runner_api_pb2.FunctionSpec(
+ urn=common_urns.coders.KV.urn)),
+ component_coder_ids=[
+ paired_coder_id,
+ context.add_or_get_coder_id(
+ coders.FloatCoder().to_runner_api(None),
+ 'doubles_coder')
+ ]))
paired_pcoll_id = copy_like(
context.components.pcollections,
@@ -764,12 +779,12 @@
context.components.pcollections,
main_input_id,
'_split',
- coder_id=paired_coder_id)
+ coder_id=sized_coder_id)
split_transform_id = copy_like(
context.components.transforms,
transform,
- unique_name=transform.unique_name + '/SplitRestriction',
- urn=common_urns.sdf_components.SPLIT_RESTRICTION.urn,
+ unique_name=transform.unique_name + '/SplitAndSizeRestriction',
+ urn=common_urns.sdf_components.SPLIT_AND_SIZE_RESTRICTIONS.urn,
inputs=dict(transform.inputs, **{main_input_tag: paired_pcoll_id}),
outputs={'out': split_pcoll_id})
@@ -777,7 +792,9 @@
context.components.transforms,
transform,
unique_name=transform.unique_name + '/Process',
- urn=common_urns.sdf_components.PROCESS_ELEMENTS.urn,
+ urn=
+ common_urns.sdf_components.PROCESS_SIZED_ELEMENTS_AND_RESTRICTIONS
+ .urn,
inputs=dict(transform.inputs, **{main_input_tag: split_pcoll_id}))
yield make_stage(stage, pair_transform_id)
diff --git a/sdks/python/apache_beam/runners/worker/bundle_processor.py b/sdks/python/apache_beam/runners/worker/bundle_processor.py
index 06b0322..583f1bb 100644
--- a/sdks/python/apache_beam/runners/worker/bundle_processor.py
+++ b/sdks/python/apache_beam/runners/worker/bundle_processor.py
@@ -933,7 +933,7 @@
beam_runner_api_pb2.ParDoPayload)
def create(*args):
- class CreateRestriction(beam.DoFn):
+ class PairWithRestriction(beam.DoFn):
def __init__(self, fn, restriction_provider):
self.restriction_provider = restriction_provider
@@ -946,28 +946,29 @@
# that can be distributed.)
yield element, self.restriction_provider.initial_restriction(element)
- return _create_sdf_operation(CreateRestriction, *args)
+ return _create_sdf_operation(PairWithRestriction, *args)
@BeamTransformFactory.register_urn(
- common_urns.sdf_components.SPLIT_RESTRICTION.urn,
+ common_urns.sdf_components.SPLIT_AND_SIZE_RESTRICTIONS.urn,
beam_runner_api_pb2.ParDoPayload)
def create(*args):
- class SplitRestriction(beam.DoFn):
+ class SplitAndSizeRestrictions(beam.DoFn):
def __init__(self, fn, restriction_provider):
self.restriction_provider = restriction_provider
def process(self, element_restriction, *args, **kwargs):
element, restriction = element_restriction
- for part in self.restriction_provider.split(element, restriction):
- yield element, part
+ for part, size in self.restriction_provider.split_and_size(
+ element, restriction):
+ yield ((element, part), size)
- return _create_sdf_operation(SplitRestriction, *args)
+ return _create_sdf_operation(SplitAndSizeRestrictions, *args)
@BeamTransformFactory.register_urn(
- common_urns.sdf_components.PROCESS_ELEMENTS.urn,
+ common_urns.sdf_components.PROCESS_SIZED_ELEMENTS_AND_RESTRICTIONS.urn,
beam_runner_api_pb2.ParDoPayload)
def create(factory, transform_id, transform_proto, parameter, consumers):
assert parameter.do_fn.spec.urn == python_urns.PICKLED_DOFN_INFO
@@ -975,7 +976,7 @@
return _create_pardo_operation(
factory, transform_id, transform_proto, consumers,
serialized_fn, parameter,
- operation_cls=operations.SdfProcessElements)
+ operation_cls=operations.SdfProcessSizedElements)
def _create_sdf_operation(
diff --git a/sdks/python/apache_beam/runners/worker/operations.pxd b/sdks/python/apache_beam/runners/worker/operations.pxd
index d924b29..a947043 100644
--- a/sdks/python/apache_beam/runners/worker/operations.pxd
+++ b/sdks/python/apache_beam/runners/worker/operations.pxd
@@ -97,7 +97,7 @@
cdef public object input_info
-cdef class SdfProcessElements(DoOperation):
+cdef class SdfProcessSizedElements(DoOperation):
cdef object lock
cdef object element_start_output_bytes
diff --git a/sdks/python/apache_beam/runners/worker/operations.py b/sdks/python/apache_beam/runners/worker/operations.py
index 657efb7..8478866 100644
--- a/sdks/python/apache_beam/runners/worker/operations.py
+++ b/sdks/python/apache_beam/runners/worker/operations.py
@@ -612,10 +612,10 @@
return infos
-class SdfProcessElements(DoOperation):
+class SdfProcessSizedElements(DoOperation):
def __init__(self, *args, **kwargs):
- super(SdfProcessElements, self).__init__(*args, **kwargs)
+ super(SdfProcessSizedElements, self).__init__(*args, **kwargs)
self.lock = threading.RLock()
self.element_start_output_bytes = None
@@ -628,7 +628,7 @@
receiver.opcounter.restart_sampling()
# Actually processing the element can be expensive; do it without
# the lock.
- delayed_application = self.dofn_runner.process_with_restriction(o)
+ delayed_application = self.dofn_runner.process_with_sized_restriction(o)
if delayed_application:
self.execution_context.delayed_applications.append(
(self, delayed_application))
@@ -650,7 +650,7 @@
def progress_metrics(self):
with self.lock:
- metrics = super(SdfProcessElements, self).progress_metrics()
+ metrics = super(SdfProcessSizedElements, self).progress_metrics()
current_element_progress = self.current_element_progress()
if current_element_progress:
metrics.active_elements.measured.input_element_counts[
diff --git a/sdks/python/apache_beam/testing/data/standard_coders.yaml b/sdks/python/apache_beam/testing/data/standard_coders.yaml
index 494e749..9eb4195 100644
--- a/sdks/python/apache_beam/testing/data/standard_coders.yaml
+++ b/sdks/python/apache_beam/testing/data/standard_coders.yaml
@@ -193,3 +193,19 @@
pane: {is_first: True, is_last: True, timing: UNKNOWN, index: 0, on_time_index: 0},
windows: [{end: 1454293425000, span: 3600000}, {end: -9223372036854410, span: 365}]
}
+
+---
+
+coder:
+ urn: "beam:coder:double:v1"
+examples:
+ "\0\0\0\0\0\0\0\0": "0"
+ "\u0080\0\0\0\0\0\0\0": "-0"
+ "\u003f\u00b9\u0099\u0099\u0099\u0099\u0099\u009a": "0.1"
+ "\u00bf\u00b9\u0099\u0099\u0099\u0099\u0099\u009a": "-0.1"
+ "\0\0\0\0\0\0\0\u0001": "4.9e-324"
+ "\0\u0001\0\0\0\0\0\0": "1.390671161567e-309"
+ "\u007f\u00ef\u00ff\u00ff\u00ff\u00ff\u00ff\u00ff": "1.7976931348623157e308"
+ "\u007f\u00f0\0\0\0\0\0\0": "Infinity"
+ "\u00ff\u00f0\0\0\0\0\0\0": "-Infinity"
+ "\u007f\u00f8\0\0\0\0\0\0": "NaN"
diff --git a/sdks/python/apache_beam/transforms/core.py b/sdks/python/apache_beam/transforms/core.py
index 1d095e8..0ee672f 100644
--- a/sdks/python/apache_beam/transforms/core.py
+++ b/sdks/python/apache_beam/transforms/core.py
@@ -196,14 +196,16 @@
an instance of ``RestrictionProvider``.
The provided ``RestrictionProvider`` instance must provide suitable overrides
- for the following methods.
+ for the following methods:
* create_tracker()
* initial_restriction()
Optionally, ``RestrictionProvider`` may override default implementations of
- following methods.
+ following methods:
* restriction_coder()
+ * restriction_size()
* split()
+ * split_and_size()
** Pausing and resuming processing of an element **
@@ -253,9 +255,6 @@
reading input element for each of the returned restrictions should be the
same as the total set of elements produced by reading the input element for
the input restriction.
-
- TODO(chamikara): give suitable hints for performing splitting, for example
- number of parts or size in bytes.
"""
yield restriction
@@ -270,6 +269,20 @@
"""
return coders.registry.get_coder(object)
+ def restriction_size(self, element, restriction):
+ """Returns the size of an element with respect to the given element.
+
+ By default, asks a newly-created restriction tracker for the default size
+ of the restriction.
+ """
+ return self.create_tracker(restriction).default_size()
+
+ def split_and_size(self, element, restriction):
+ """Like split, but also does sizing, returning (restriction, size) pairs.
+ """
+ for part in self.split(element, restriction):
+ yield part, self.restriction_size(element, part)
+
def get_function_arguments(obj, func):
"""Return the function arguments based on the name provided. If they have
@@ -398,7 +411,7 @@
If specified, following default arguments are used by the ``DoFnRunner`` to
be able to pass the parameters correctly.
- ``DoFn.ElementParam``: element to be processed.
+ ``DoFn.ElementParam``: element to be processed, should not be mutated.
``DoFn.SideInputParam``: a side input that may be used when processing.
``DoFn.TimestampParam``: timestamp of the input element.
``DoFn.WindowParam``: ``Window`` the input element belongs to.
@@ -597,20 +610,21 @@
"""
raise NotImplementedError(str(self))
- def add_input(self, accumulator, element, *args, **kwargs):
+ def add_input(self, mutable_accumulator, element, *args, **kwargs):
"""Return result of folding element into accumulator.
CombineFn implementors must override add_input.
Args:
- accumulator: the current accumulator
- element: the element to add
+ mutable_accumulator: the current accumulator,
+ may be modified and returned for efficiency
+ element: the element to add, should not be mutated
*args: Additional arguments and side inputs.
**kwargs: Additional arguments and side inputs.
"""
raise NotImplementedError(str(self))
- def add_inputs(self, accumulator, elements, *args, **kwargs):
+ def add_inputs(self, mutable_accumulator, elements, *args, **kwargs):
"""Returns the result of folding each element in elements into accumulator.
This is provided in case the implementation affords more efficient
@@ -618,21 +632,27 @@
over the inputs invoking add_input for each one.
Args:
- accumulator: the current accumulator
- elements: the elements to add
+ mutable_accumulator: the current accumulator,
+ may be modified and returned for efficiency
+ elements: the elements to add, should not be mutated
*args: Additional arguments and side inputs.
**kwargs: Additional arguments and side inputs.
"""
for element in elements:
- accumulator = self.add_input(accumulator, element, *args, **kwargs)
- return accumulator
+ mutable_accumulator =\
+ self.add_input(mutable_accumulator, element, *args, **kwargs)
+ return mutable_accumulator
def merge_accumulators(self, accumulators, *args, **kwargs):
"""Returns the result of merging several accumulators
to a single accumulator value.
Args:
- accumulators: the accumulators to merge
+ accumulators: the accumulators to merge.
+ Only the first accumulator may be modified and returned for efficiency;
+ the other accumulators should not be mutated, because they may be
+ shared with other code and mutating them could lead to incorrect
+ results or data corruption.
*args: Additional arguments and side inputs.
**kwargs: Additional arguments and side inputs.
"""
@@ -1246,7 +1266,7 @@
if (output_hint is None
and get_type_hints(wrapper).input_types
and get_type_hints(wrapper).input_types[0]):
- output_hint = get_type_hints(wrapper).input_types[0]
+ output_hint = get_type_hints(wrapper).input_types[0][0]
if output_hint:
get_type_hints(wrapper).set_output_types(typehints.Iterable[output_hint])
# pylint: disable=protected-access
diff --git a/sdks/python/apache_beam/typehints/typed_pipeline_test.py b/sdks/python/apache_beam/typehints/typed_pipeline_test.py
index b630307..2e461f9 100644
--- a/sdks/python/apache_beam/typehints/typed_pipeline_test.py
+++ b/sdks/python/apache_beam/typehints/typed_pipeline_test.py
@@ -101,6 +101,13 @@
with self.assertRaises(typehints.TypeCheckError):
[1, 2, 3] | (beam.ParDo(my_do_fn) | 'again' >> beam.ParDo(my_do_fn))
+ def test_filter_type_hint(self):
+ @typehints.with_input_types(int)
+ def filter_fn(data):
+ return data % 2
+
+ self.assertEquals([1, 3], [1, 2, 3] | beam.Filter(filter_fn))
+
class NativeTypesTest(unittest.TestCase):
diff --git a/sdks/python/test-suites/dataflow/py3/build.gradle b/sdks/python/test-suites/dataflow/py3/build.gradle
index 15e2a49..d83b01c 100644
--- a/sdks/python/test-suites/dataflow/py3/build.gradle
+++ b/sdks/python/test-suites/dataflow/py3/build.gradle
@@ -20,7 +20,7 @@
applyPythonNature()
// Required to setup a Python 3 virtualenv.
-project.ext.python3 = true
+pythonVersion = '3.5'
def runScriptsDir = "${project.rootDir}/sdks/python/scripts"
diff --git a/sdks/python/test-suites/direct/py3/build.gradle b/sdks/python/test-suites/direct/py3/build.gradle
index a1321ee..c9e3ed8 100644
--- a/sdks/python/test-suites/direct/py3/build.gradle
+++ b/sdks/python/test-suites/direct/py3/build.gradle
@@ -20,7 +20,7 @@
applyPythonNature()
// Required to setup a Python 3 virtualenv.
-project.ext.python3 = true
+pythonVersion = '3.5'
def runScriptsDir = "${project.rootDir}/sdks/python/scripts"
diff --git a/sdks/python/test-suites/tox/py35/build.gradle b/sdks/python/test-suites/tox/py35/build.gradle
index 5f86fe9..ca3d37c 100644
--- a/sdks/python/test-suites/tox/py35/build.gradle
+++ b/sdks/python/test-suites/tox/py35/build.gradle
@@ -24,7 +24,7 @@
applyPythonNature()
// Required to setup a Python 3 virtualenv.
-project.ext.python3 = true
+pythonVersion = '3.5'
task lint {}
check.dependsOn lint
diff --git a/sdks/python/test-suites/tox/py36/build.gradle b/sdks/python/test-suites/tox/py36/build.gradle
index 8dc497d..c1615ef 100644
--- a/sdks/python/test-suites/tox/py36/build.gradle
+++ b/sdks/python/test-suites/tox/py36/build.gradle
@@ -23,6 +23,9 @@
plugins { id 'org.apache.beam.module' }
applyPythonNature()
+// Required to setup a Python 3 virtualenv.
+pythonVersion = '3.6'
+
toxTask "testPython36", "py36"
test.dependsOn testPython36
diff --git a/website/build.gradle b/website/build.gradle
index f0cab59..175cff2 100644
--- a/website/build.gradle
+++ b/website/build.gradle
@@ -187,7 +187,6 @@
}
testWebsite.dependsOn 'buildLocalWebsite'
-check.dependsOn testWebsite
task preCommit {
dependsOn testWebsite
diff --git a/website/src/contribute/release-guide.md b/website/src/contribute/release-guide.md
index 06fb2c4..4467533 100644
--- a/website/src/contribute/release-guide.md
+++ b/website/src/contribute/release-guide.md
@@ -409,7 +409,7 @@
The build with `-PisRelease` creates the combined Javadoc for the release in `sdks/java/javadoc`.
-The file `sdks/java/javadoc/ant.xml` file contains a list of modules to include
+The file `sdks/java/javadoc/build.gradle` contains a list of modules to include
in and exclude, plus a list of offline URLs that populate links from Beam's
Javadoc to the Javadoc for other modules that Beam depends on.
@@ -463,18 +463,19 @@
### Checklist to proceed to the next step
-1. Release Manager’s GPG key is published to `dist.apache.org`
-2. Release Manager’s GPG key is configured in `git` configuration
-3. Release Manager has `org.apache.beam` listed under `Staging Profiles` in Nexus
-4. Release Manager’s Nexus User Token is configured in `settings.xml`
-5. JIRA release item for the subsequent release has been created
-6. All test failures from branch verification have associated JIRA issues
-7. There are no release blocking JIRA issues
-8. Release Notes in JIRA have been audited and adjusted
-9. Combined javadoc has the appropriate contents.
-10. Release branch has been created
-11. Originating branch has the version information updated to the new version
-12. Nightly snapshot is in progress (do revisit it continually)
+* Release Manager’s GPG key is published to `dist.apache.org`
+* Release Manager’s GPG key is configured in `git` configuration
+* Release Manager has `org.apache.beam` listed under `Staging Profiles` in Nexus
+* Release Manager’s Nexus User Token is configured in `settings.xml`
+* JIRA release item for the subsequent release has been created
+* All test failures from branch verification have associated JIRA issues
+* There are no release blocking JIRA issues
+* Release Notes in JIRA have been audited and adjusted
+* Combined javadoc has the appropriate contents.
+* Release branch has been created
+* There are no open pull requests to release branch
+* Originating branch has the version information updated to the new version
+* Nightly snapshot is in progress (do revisit it continually)
**********
diff --git a/website/src/documentation/io/built-in-hadoop.md b/website/src/documentation/io/built-in-hadoop.md
index 37ae66fe..fd330ec 100644
--- a/website/src/documentation/io/built-in-hadoop.md
+++ b/website/src/documentation/io/built-in-hadoop.md
@@ -376,7 +376,7 @@
MyDbOutputFormatKeyClass, Object.class);
myHadoopConfiguration.setClass("mapreduce.job.output.value.class",
MyDbOutputFormatValueClass, Object.class);
-myHadoopConfiguration.setClass("mapreduce.job.output.value.class",
+myHadoopConfiguration.setClass("mapreduce.job.partitioner.class",
MyPartitionerClass, Object.class);
myHadoopConfiguration.setInt("mapreduce.job.reduces", 2);
```
diff --git a/website/src/feed.xml b/website/src/feed.xml
index 2d28757..ccb6f51 100644
--- a/website/src/feed.xml
+++ b/website/src/feed.xml
@@ -1,6 +1,7 @@
---
layout: null
---
+<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
@@ -12,7 +13,6 @@
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
-<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>{{ site.title | xml_escape }}</title>
diff --git a/website/src/get-started/mobile-gaming-example.md b/website/src/get-started/mobile-gaming-example.md
index 39d1480..6d1c8f1 100644
--- a/website/src/get-started/mobile-gaming-example.md
+++ b/website/src/get-started/mobile-gaming-example.md
@@ -144,7 +144,7 @@
The `HourlyTeamScore` pipeline expands on the basic batch analysis principles used in the `UserScore` pipeline and improves upon some of its limitations. `HourlyTeamScore` performs finer-grained analysis, both by using additional features in the Beam SDKs, and taking into account more aspects of the game data. For example, `HourlyTeamScore` can filter out data that isn't part of the relevant analysis period.
-Like `UserScore`, `HourlyTeamScore` is best thought of as a job to be run periodically after all the relevant data has been gathered (such as once per day). The pipeline reads a fixed data set from a file, and writes the results to a Google Cloud BigQuery table.
+Like `UserScore`, `HourlyTeamScore` is best thought of as a job to be run periodically after all the relevant data has been gathered (such as once per day). The pipeline reads a fixed data set from a file, and writes the results <span class="language-java">back to a text file</span><span class="language-py">to a Google Cloud BigQuery table</span>.
{:.language-java}
> **Note:** See [HourlyTeamScore on GitHub](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/complete/game/HourlyTeamScore.java) for the complete example pipeline program.
@@ -407,4 +407,4 @@
* Dive in to some of our favorite [Videos and Podcasts]({{ site.baseurl }}/documentation/resources/videos-and-podcasts).
* Join the Beam [users@]({{ site.baseurl }}/community/contact-us) mailing list.
-Please don't hesitate to [reach out]({{ site.baseurl }}/community/contact-us) if you encounter any issues!
\ No newline at end of file
+Please don't hesitate to [reach out]({{ site.baseurl }}/community/contact-us) if you encounter any issues!