commit | fdc40b0bb87be820bf671b26514b770b852c63da | [log] [tgz] |
---|---|---|
author | Jia Yu <jiayu2@asu.edu> | Sun Dec 10 05:34:13 2017 -0700 |
committer | GitHub <noreply@github.com> | Sun Dec 10 05:34:13 2017 -0700 |
tree | d71c08908a324984229d473f8f126d81e8c1ce16 | |
parent | aa9d1a53893799b21b41f416c6d3d343e1318481 [diff] | |
parent | d4e0a6942148c3944beb84d76866dbdaeeab2093 [diff] |
Merge pull request #169 from jiayuasu/master Update README
Status | Stable | Latest | Source code | Spark compatibility |
---|---|---|---|---|
GeoSpark | Spark 2.X, 1.X | |||
GeoSparkSQL | Spark SQL 2.1 | |||
GeoSparkViz | Spark 2.X, 1.X |
GeoSpark@Twitter||GeoSpark Discussion Board||
GeoSpark is listed as Infrastructure Project on Apache Spark Official Third Party Project Page
GeoSpark is a cluster computing system for processing large-scale spatial data. GeoSpark extends Apache Spark with a set of out-of-the-box Spatial Resilient Distributed Datasets (SRDDs) that efficiently load, process, and analyze large-scale spatial data across machines. GeoSpark provides APIs for Apache Spark programmer to easily develop their spatial analysis programs with Spatial Resilient Distributed Datasets (SRDDs) which have in house support for geometrical and Spatial Queries (Range, K Nearest Neighbors, Join).
GeoSpark artifacts are hosted in Maven Central: Maven Central Coordinates
GeoSparkSQL fully supports Apache Spark SQL. Features are as follows:
Supported Spatial RDDs: PointRDD, RectangleRDD, PolygonRDD, LineStringRDD
The generic SpatialRDD supports heterogenous geometries:
Native input format support:
User-supplied input format mapper: Any single-line input formats
Supported Spatial Partitioning techniques: Quad-Tree (recommend), KDB-Tree (recommend), R-Tree, Voronoi diagram, Uniform grids (Experimental), Hilbert Curve (Experimental)
Supported Spatial Indexes: Quad-Tree and R-Tree. R-Tree supports Spatial K Nearest Neighbors query.
DatasetBoundary, Minimum Bounding Rectangle, Polygon Union
Spatial Range Query, Distance Join Query, Spatial Join Query (Inside and Overlap), and Spatial K Nearest Neighbors Query.
GeoSpark allows users to transform the original CRS (e.g., degree based coordinates such as EPSG:4326 and WGS84) to any other CRS (e.g., meter based coordinates such as EPSG:3857) so that it can accurately process both geographic data and geometrical data. Please specify your desired CRS in GeoSpark Spatial RDD constructor (Example).
Please make a Pull Request to add yourself!
GeoSpark full tutorial is available at GeoSpark GitHub Wiki: GeoSpark GitHub Wiki
GeoSpark Scala and Java template project is available here: Template Project
GeoSpark Function Use Cases: Scala Example, Java Example
GeoSparkViz is a large-scale in-memory geospatial visualization system.
GeoSparkViz provides native support for general cartographic design by extending GeoSpark to process large-scale spatial data. It can visulize Spatial RDD and Spatial Queries and render super high resolution image in parallel.
More details are available here: GeoSpark Visualization Extension
Watch High Resolution on a real map
Jia Yu, Jinxuan Wu, Mohamed Sarwat. “A Demonstration of GeoSpark: A Cluster Computing Framework for Processing Big Spatial Data”. (demo paper) In Proceeding of IEEE International Conference on Data Engineering ICDE 2016, Helsinki, FI, May 2016
Jia Yu, Jinxuan Wu, Mohamed Sarwat. “GeoSpark: A Cluster Computing Framework for Processing Large-Scale Spatial Data”. (short paper) In Proceeding of the ACM International Conference on Advances in Geographic Information Systems ACM SIGSPATIAL GIS 2015, Seattle, WA, USA November 2015
We welcome people to use GeoSpark for benchmark purpose. To achieve the best performance or enjoy all features of GeoSpark,
Currently, we have published two papers about GeoSpark. Only these two papers are associated with GeoSpark Development Team.
Jia Yu (Email: jiayu2@asu.edu)
Mohamed Sarwat (Email: msarwat@asu.edu)
Please visit GeoSpark project wesbite for latest news and releases.
GeoSpark is one of the projects initiated by Data Systems Lab at Arizona State University. The mission of Data Systems Lab is designing and developing experimental data management systems (e.g., database systems).