blob: 90fc6fd7b97697e69745cae819ec2d4a4bc952e0 [file] [log] [blame] [view]
<!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to you under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
{% endcomment %}
-->
## Administrator setup
### With [`Cloud Shell`](https://console.aws.amazon.com/cloudshell/home):
Assumed variables,
| Name | Value |
| --- | --- |
| `UserName` | `systemds-bot` |
| `GroupName` | `systemds-group` |
#### 1. Create a user and a group
Create a user and a group, and join user to the created group.
[`create-user`](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/iam/create-user.html)
```sh
[cloudshell-user@host ~]$ aws iam create-user --user-name systemds-bot
{
"User": {
"Path": "/",
"UserName": "systemds-bot",
"UserId": "AIDAQSHHX7DDAODFXYZ3",
"Arn": "arn:aws:iam::12345:user/systemds-bot",
"CreateDate": "2021-04-10T20:36:59+00:00"
}
}
```
[`create-group`](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/iam/create-group.html)
```sh
[cloudshell-user@host ~]$aws iam create-group --group-name systemds-group
{
"Group": {
"Path": "/",
"GroupName": "systemds-group",
"GroupId": "AGPAQSHHX7DDB3XYZABCW",
"Arn": "arn:aws:iam::12345:group/systemds-group",
"CreateDate": "2021-04-10T20:41:58+00:00"
}
}
```
#### 2. Attach roles to the group
[`attach-group-policy`](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/iam/attach-group-policy.html)
```sh
aws iam attach-group-policy --policy-arn arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceRole --group-name systemds-group
aws iam attach-group-policy --policy-arn arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceforEC2Role --group-name systemds-group
aws iam attach-group-policy --policy-arn arn:aws:iam::aws:policy/AmazonElasticMapReduceFullAccess --group-name systemds-group
aws iam attach-group-policy --policy-arn arn:aws:iam::aws:policy/AWSKeyManagementServicePowerUser --group-name systemds-group
aws iam attach-group-policy --policy-arn arn:aws:iam::aws:policy/IAMUserSSHKeys --group-name systemds-group
# Grant cloud shell access too.
aws iam attach-group-policy --policy-arn arn:aws:iam::aws:policy/AWSCloudShellFullAccess --group-name systemds-group
# To create EC2 keys
aws iam attach-group-policy --policy-arn arn:aws:iam::aws:policy/AmazonEC2FullAccess --group-name systemds-group
```
#### 3. Add user to the group
```sh
aws iam add-user-to-group --user-name systemds-bot --group-name systemds-group
```
#### 4. Create the login-profile with credentials
```sh
$ aws iam create-login-profile --generate-cli-skeleton > login-profile.json
```
`login-profile.json` contains
```json
{
"LoginProfile": {
"UserName": "",
"Password": "",
"PasswordResetRequired": false
}
}
```
Create the credentials manually by editing `login-profile.json`.
| Name | Value |
| --- | --- |
| `UserName` | `systemds-bot` |
| `Password` | For example, `9U*tYP` |
| `PasswordResetRequired` | `false` |
Now, create the login profile.
```sh
aws iam create-login-profile --cli-input-json file://login-profile.json
```
---
### With [`AWS CLI`](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html):
1. Create aws account / use your existing aws account
2. Install `aws-cli` specific to your Operating System.
3. Create a user
* Create a new user (https://console.aws.amazon.com/iam/home?#/users)
* Create new group and add the following policies to it:
- AmazonElasticMapReduceRole
- AmazonElasticMapReduceforEC2Role
- AdministratorAccess
- AmazonElasticMapReduceFullAccess
- AWSKeyManagementServicePowerUser
- IAMUserSSHKeys
4. Configure your aws-cli (https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html#cli-quick-configuration)
## User Setup
5. Spin up an EMR cluster with SystemDS
* Put your SystemDS artifacts (dml-scripts, jars, config-file) in the directory systemds
* Edit configuration in: systemds_cluster.config
* Run: ./spinup_systemds_cluster.sh
6. Run a SystemDS script
* Run: ./run_systemds_script.sh path/to/script.dml
With args: ./run_systemds_script.sh path/to/script.dml "1.0, 2.6"
7. Terminate the EMR cluster: ./terminate_systemds_cluster.sh
#### Further work
* Finetune the memory
https://aws.amazon.com/blogs/big-data/best-practices-for-successfully-managing-memory-for-apache-spark-applications-on-amazon-emr/
https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-configure.html#spark-defaults
* Test if Scale to 100 nodes
* Make the cluster WebUIs (Ganglia, SparkUI,..) accessible from outside
* Integrate spot up instances