tree: cb908c68c79e35e9121c78f47479b8e7a690cea1 [path history] [tgz]
  1. scripts/
  2. src/
  3. build.gradle
  4. README.md
partitioned/README.md

Geode partitioned region example

This example demonstrates the basic property of partitioning. The basic property of partitioning is that data entries are distributed across all servers that host a region. The distribution is like database sharding, except that the distribution occurs automatically.

In this example, two servers host a single partitioned region. There is no redundancy, so that the basic property of partitioning may be observed. The Producer code puts the 10 entries into the region. The Consumer gets and prints the entries from the region. Because the region is partitioned, its entries are distributed among the two servers hosting the region. Since there is no redundancy of the data within the region, when one of the servers goes away, the entries hosted within that server are also gone; the example demonstrates this.

Demonstration of Partitioning

  1. Set directory geode-examples/partitioned to be the current working directory. Each step in this example specifies paths relative to that directory.

  2. Build the jar (with the EmployeeKey and EmployeeData classes):

    $   ../gradlew build
    
  3. Run a script that starts a locator and two servers. The built JAR will be placed onto the classpath when the script starts the servers:

    $ scripts/startAll.sh
    

    Each of the servers hosts the partitioned region.

  4. Run the producer to put 10 entries into the EmployeeRegion.

    $ ../gradlew run -Pmain=Producer
    ...
    ... 
    INFO: Inserted 10 entries in EmployeeRegion.
    
  5. There are several ways to see the contents of the region. Run the consumer to get and print all 10 entries in the EmployeeRegion.

    $ ../gradlew run -Pmain=Consumer
    ...
    INFO: 10 entries in EmployeeRegion on the server(s).
    ...
    

    If desired, use a gfsh query to see contents of the region keys:

    $ $GEODE_HOME/bin/gfsh
    ...
    gfsh>connect
    gfsh>query --query="select e.key from /EmployeeRegion.entries e"
    ...
    

    Note that the quantity of entries may also be observed with gfsh:

    gfsh>describe region --name=EmployeeRegion
    ..........................................................
    Name            : EmployeeRegion
    Data Policy     : partition
    Hosting Members : server2
                      server1
    
    Non-Default Attributes Shared By Hosting Members  
    
     Type  |    Name     | Value
    ------ | ----------- | ---------
    Region | size        | 10
           | data-policy | PARTITION
    

    As an alternative, gfsh maybe used to identify how many entries are in the region on each server by looking at statistics.

    gfsh>show metrics --categories=partition --region=/EmployeeRegion --member=server1
    

    Within the output, the result for totalBucketSize identifies the number of entries hosted on the specified server. Vary the command to see statistics for both server1 and server2. Note that approximately half the entries will be on each server. And, the quantity on each server may vary if the example is started over and run again.

  6. The region entries are distributed across both servers. Kill one of the servers:

    $ $GEODE_HOME/bin/gfsh
    ...
    gfsh>connect
    gfsh>stop server --name=server1
    gfsh>quit
    
  7. Run the consumer a second time, and notice that approximately half of the entries of the EmployeeRegion are still available on the remaining server. Those hosted by the server that was stopped were lost.

    $ ../gradlew run -Pmain=Consumer
    ...
    ...
    INFO: 5 entries in EmployeeRegion on the server(s).
    ...
    
  8. Shut down the system:

    $ scripts/stopAll.sh
    

Things to Get Right for Partitioned Regions

  • Hashing distributes entries among buckets that reside on servers. A good hash code is important in order to spread the entries among buckets (and therefore, among servers).

  • Besides the hash code, equals() needs to be defined.

  • A system that ought to not lose data if a system member goes down will use redundancy in conjunction with partitioning in production systems.