blob: 5e2bf734d7ff1ddee829997b538eb78af090c678 [file] [log] [blame]
<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
<!ENTITY % BOOK_ENTITIES SYSTEM "CloudStack_GSoC_Guide.ent">
%BOOK_ENTITIES;
]>
<!-- Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<chapter id="gsoc-dharmesh">
<title>Dharmesh's 2013 GSoC Proposal</title>
<para>This chapter describes Dharmrsh's 2013 Google Summer of Code project within the &PRODUCT; ASF project. It is a copy paste of the submitted proposal.</para>
<section id="abstract-dharmesh">
<title>Abstract</title>
<para>
The project aims to bring <ulink url="http://aws.amazon.com/cloudformation/"><citetitle>cloudformation</citetitle></ulink> like service to cloudstack. One of the prime use-case is cluster computing frameworks on cloudstack. A cloudformation service will give users and administrators of cloudstack ability to manage and control a set of resources easily. The cloudformation will allow booting and configuring a set of VMs and form a cluster. Simple example would be LAMP stack. More complex clusters such as mesos or hadoop cluster requires a little more advanced configuration. There is already some work done by Chiradeep Vittal at this front [5]. In this project, I will implement server side cloudformation service for cloudstack and demonstrate how to run mesos cluster using it.
</para>
</section>
<section id="mesos">
<title>Mesos</title>
<para>
<ulink url="http://incubator.apache.org/mesos/"><citetitle>Mesos</citetitle></ulink> is a resource management platform for clusters. It aims to increase resource utilization of clusters by sharing cluster resources among multiple processing frameworks(like MapReduce, MPI, Graph Processing) or multiple instances of same framework. It provides efficient resource isolation through use of containers. Uses zookeeper for state maintenance and fault tolerance.
</para>
</section>
<section id="mesos-use">
<title>What can run on mesos ?</title>
<para><emphasis role="bold">Spark:</emphasis> A cluster computing framework based on the Resilient Distributed Datasets (RDDs) abstraction. RDD is more generalized than MapReduce and can support iterative and interactive computation while retaining fault tolerance, scalability, data locality etc.</para>
<para><emphasis role="bold">Hadoop:</emphasis>: Hadoop is fault tolerant and scalable distributed computing framework based on MapReduce abstraction.</para>
<para><emphasis role="bold">Begel:</emphasis>: A graph processing framework based on pregel.</para>
<para>and other frameworks like MPI, Hypertable.</para>
</section>
<section id="mesos-deploy">
<title>How to deploy mesos ?</title>
<para>Mesos provides cluster installation <ulink url="https://github.com/apache/mesos/blob/trunk/docs/Deploy-Scripts.textile"><citetitle>scripts</citetitle></ulink> for cluster deployment. There are also scripts available to deploy a cluster on <ulink url="https://github.com/apache/mesos/blob/trunk/docs/EC2-Scripts.textile"><citetitle>Amazon EC2</citetitle></ulink>. It would be interesting to see if this scripts can be leveraged in anyway.</para>
</section>
<section id="deliverables-dharmesh">
<title>Deliverables</title>
<orderedlist>
<listitem>
<para>Deploy CloudStack and understand instance configuration/contextualization</para>
</listitem>
<listitem>
<para>Test and deploy Mesos on a set of CloudStack based VM, manually. Design/propose an automation framework</para>
</listitem>
<listitem>
<para>Test stackmate and engage chiradeep (report bugs, make suggestion, make pull request)</para>
</listitem>
<listitem>
<para>Create cloudformation template to provision a Mesos Cluster</para>
</listitem>
<listitem>
<para>Compare with Apache Whirr or other cluster provisioning tools for server side implementation of cloudformation service.</para>
</listitem>
</orderedlist>
</section>
<section id="arch-and-tools">
<title>Architecture and Tools</title>
<para>The high level architecture is as follows:</para>
<para>
<mediaobject>
<imageobject>
<imagedata fileref="images/mesos-integration-arch.jpg"/>
</imageobject>
</mediaobject>
</para>
<para>It includes following components:</para>
<orderedlist>
<listitem>
<para>CloudFormation Query API server:</para>
<para>This acts as a point of contact to and exposes CloudFormation functionality as Query API. This can be accessed directly or through existing tools from Amazon AWS for their cloudformation service. It will be easy to start as a module which resides outside cloudstack at first and I plan to use dropwizard [3] to start with. Later may be the API server can be merged with cloudstack core. I plan to use mysql for storing details of clusters.</para>
</listitem>
<listitem>
<para>Provisioning:</para>
<para>Provisioning module is responsible for handling the booting process of the VMs through cloudstack. This uses the cloudstack APIs for launching VMs. I plan to use preconfigured templates/images with required dependencies installed, which will make cluster creation process much faster even for large clusters. Error handling is very important part of this module. For example, what you do if few VMs fail to boot in cluster ?</para>
</listitem>
<listitem>
<para>Configuration:</para>
<para>This module deals with configuring the VMs to form a cluster. This can be done via manual scripts/code or via configuration management tools like chef/ironfan/knife. Potentially workflow automation tools like rundeck [4] also can be used. Also Apache whirr and Provisionr are options. I plan to explore this tools and select suitable ones.</para>
</listitem>
</orderedlist>
</section>
<section id="api">
<title>API</title>
<para>Query <ulink url="http://docs.aws.amazon.com/AWSCloudFormation/latest/APIReference/API_Operations.html"><citetitle>API</citetitle></ulink> will be based on Amazon AWS cloudformation service. This will allow leveraging existing <ulink url="http://aws.amazon.com/developertools/AWS-CloudFormation"><citetitle>tools</citetitle></ulink> for AWS.</para>
</section>
<section id="timeline">
<title>Timeline</title>
<para>1-1.5 week : project design. Architecture, tools selection, API design</para>
<para>1-1.5 week : getting familiar with cloudstack and stackmate codebase and architecture details</para>
<para>1-1.5 week : getting familiar with mesos internals</para>
<para>1-1.5 week : setting up the dev environment and create mesos templates</para>
<para>2-3 week : build provisioning and configuration module</para>
<para>Midterm evaluation: provisioning module, configuration module</para>
<para>2-3 week : develope cloudformation server side implementation</para>
<para>2-3 week : test and integrate</para>
</section>
<section id="future-work">
<title>Future Work</title>
<orderedlist>
<listitem>
<para><emphasis role="bold">Auto Scaling:</emphasis></para>
<para>Automatically adding or removing VMs from mesos cluster based on various conditions like utilization going above/below a static threshold. There can be more sophisticated strategies based on prediction or fine grained metric collection with tight integration with mesos framework.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Cluster Simulator:</emphasis></para>
<para>Integrating with existing simulator to simulate mesos clusters. This can be useful in various scenarios, for example while developing a new scheduling algorithm, testing autoscaling etc.</para>
</listitem>
</orderedlist>
</section>
</chapter>