| { |
| "cells": [ |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "# Apache Spot's Ipython Advanced Mode\n", |
| "## Proxy\n", |
| "\n", |
| "This guide provides examples about how to request data, show data with some cool libraries like pandas and more.\n" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "**Import Libraries**\n", |
| "\n", |
| "The next cell will import the necessary libraries to execute the functions. Do not remove" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": { |
| "collapsed": false |
| }, |
| "outputs": [], |
| "source": [ |
| "import datetime\n", |
| "import pandas as pd\n", |
| "import numpy as np\n", |
| "import linecache, bisect\n", |
| "import os\n", |
| "\n", |
| "spath = os.getcwd()\n", |
| "path = spath.split(\"/\")\n", |
| "date = path[len(path)-1]" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "**Request Data**\n", |
| "\n", |
| "In order to request data we are using Graphql (a query language for APIs, more info at: http://graphql.org/).\n", |
| "\n", |
| "We provide the function to make a data request, all you need is a query and variables\n" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": { |
| "collapsed": false |
| }, |
| "outputs": [], |
| "source": [ |
| "def makeGraphqlRequest(query, variables):\n", |
| " return GraphQLClient.request(query, variables)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "Now that we have a function, we can run a query like this:\n", |
| "\n", |
| "*Note: There's no need to manually set the date for the query, by default the code will read the date from the current path" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": { |
| "collapsed": false |
| }, |
| "outputs": [], |
| "source": [ |
| "suspicious_query = \"\"\"query($date:SpotDateType) {\n", |
| " proxy {\n", |
| " suspicious(date:$date)\n", |
| " { clientIp\n", |
| " clientToServerBytes\n", |
| " datetime\n", |
| " duration\n", |
| " host\n", |
| " networkContext\n", |
| " referer\n", |
| " requestMethod\n", |
| " responseCode\n", |
| " responseCodeLabel\n", |
| " responseContentType\n", |
| " score\n", |
| " serverIp\n", |
| " serverToClientBytes\n", |
| " uri\n", |
| " uriPath\n", |
| " uriPort\n", |
| " uriQuery\n", |
| " uriRep\n", |
| " userAgent\n", |
| " username\n", |
| " webCategory \n", |
| " }\n", |
| " }\n", |
| " }\"\"\"\n", |
| "\n", |
| "##If you want to use a different date for your query, switch the \n", |
| "##commented/uncommented following lines\n", |
| "\n", |
| "variables={\n", |
| " 'date': datetime.datetime.strptime(date, '%Y%m%d').strftime('%Y-%m-%d')\n", |
| "# 'date': \"2016-10-08\"\n", |
| " }\n", |
| " \n", |
| "suspicious_request = makeGraphqlRequest(suspicious_query,variables)\n", |
| "\n", |
| "##The variable suspicious_request will contain the resulting data from the query.\n", |
| "results = suspicious_request['data']['proxy']['suspicious']\n" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "##Pandas Dataframes\n", |
| "\n", |
| "The following cell loads the results into a pandas dataframe\n", |
| "\n", |
| "For more information on how to use pandas, you can learn more here: https://pandas.pydata.org/pandas-docs/stable/10min.html" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": { |
| "collapsed": false |
| }, |
| "outputs": [], |
| "source": [ |
| "df = pd.read_json(json.dumps(results))\n", |
| "##Printing only the selected column list from the dataframe\n", |
| "##Unless specified otherwise, \n", |
| "print df[['clientIp','uriQuery','datetime','clientToServerBytes','serverToClientBytes', 'host']]\n" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "##Additional operations \n", |
| "\n", |
| "Additional operations can be performed on the dataframe like sorting the data, filtering it and grouping it\n", |
| "\n", |
| "**Filtering the data**" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": { |
| "collapsed": false |
| }, |
| "outputs": [], |
| "source": [ |
| "##Filter results where the destination port = 3389\n", |
| "##The resulting data will be stored in df2 \n", |
| "\n", |
| "df2 = df[df['clientIp'].isin(['10.173.202.136'])]\n", |
| "print df2[['clientIp','uriQuery','datetime','host']]" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "**Ordering the data**" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": { |
| "collapsed": false |
| }, |
| "outputs": [], |
| "source": [ |
| "srtd = df.sort_values(by=\"host\")\n", |
| "print srtd[['host','clientIp','uriQuery','datetime']]" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "**Grouping the data**" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": { |
| "collapsed": false |
| }, |
| "outputs": [], |
| "source": [ |
| "## This command will group the results by pairs of source-destination IP\n", |
| "## summarizing all other columns \n", |
| "grpd = df.groupby(['clientIp','host']).sum()\n", |
| "## This will print the resulting dataframe displaying the input and output bytes columnns\n", |
| "print grpd[[\"clientToServerBytes\",\"serverToClientBytes\"]]" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "**Reset Scored Connections**\n", |
| "\n", |
| "Uncomment and execute the following cell to reset all scored connections for this day" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": { |
| "collapsed": true |
| }, |
| "outputs": [], |
| "source": [ |
| "# reset_scores = \"\"\"mutation($date:SpotDateType!) {\n", |
| "# proxy{\n", |
| "# resetScoredConnections(date:$date){\n", |
| "# success\n", |
| "# }\n", |
| "# }\n", |
| "# }\"\"\"\n", |
| "\n", |
| "\n", |
| "# variables={\n", |
| "# 'date': datetime.datetime.strptime(date, '%Y%m%d').strftime('%Y-%m-%d')\n", |
| "# }\n", |
| " \n", |
| "# request = makeGraphqlRequest(reset_scores,variables)\n", |
| "\n", |
| "\n", |
| "# print request['data']['proxy']['resetScoredConnections']['success']" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "##Sandbox\n", |
| "\n", |
| "At this point you can perform your own analysis using the previously provided functions as a guide.\n", |
| "\n", |
| "Happy threat hunting!" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": { |
| "collapsed": true |
| }, |
| "outputs": [], |
| "source": [ |
| "#Your code here" |
| ] |
| } |
| ], |
| "metadata": { |
| "kernelspec": { |
| "display_name": "Python 2", |
| "language": "python", |
| "name": "python2" |
| }, |
| "language_info": { |
| "codemirror_mode": { |
| "name": "ipython", |
| "version": 2 |
| }, |
| "file_extension": ".py", |
| "mimetype": "text/x-python", |
| "name": "python", |
| "nbconvert_exporter": "python", |
| "pygments_lexer": "ipython2", |
| "version": "2.7.10" |
| } |
| }, |
| "nbformat": 4, |
| "nbformat_minor": 0 |
| } |