blob: f9d95835f4b4c4da16f2cf7ede4580dc8e12b2d5 [file] [log] [blame]
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Apache Spot's Ipython Advanced Mode\n",
"## Proxy\n",
"\n",
"This guide provides examples about how to request data, show data with some cool libraries like pandas and more.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Import Libraries**\n",
"\n",
"The next cell will import the necessary libraries to execute the functions. Do not remove"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import datetime\n",
"import pandas as pd\n",
"import numpy as np\n",
"import linecache, bisect\n",
"import os\n",
"\n",
"spath = os.getcwd()\n",
"path = spath.split(\"/\")\n",
"date = path[len(path)-1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Request Data**\n",
"\n",
"In order to request data we are using Graphql (a query language for APIs, more info at: http://graphql.org/).\n",
"\n",
"We provide the function to make a data request, all you need is a query and variables\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def makeGraphqlRequest(query, variables):\n",
" return GraphQLClient.request(query, variables)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we have a function, we can run a query like this:\n",
"\n",
"*Note: There's no need to manually set the date for the query, by default the code will read the date from the current path"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"suspicious_query = \"\"\"query($date:SpotDateType) {\n",
" proxy {\n",
" suspicious(date:$date)\n",
" { clientIp\n",
" clientToServerBytes\n",
" datetime\n",
" duration\n",
" host\n",
" networkContext\n",
" referer\n",
" requestMethod\n",
" responseCode\n",
" responseCodeLabel\n",
" responseContentType\n",
" score\n",
" serverIp\n",
" serverToClientBytes\n",
" uri\n",
" uriPath\n",
" uriPort\n",
" uriQuery\n",
" uriRep\n",
" userAgent\n",
" username\n",
" webCategory \n",
" }\n",
" }\n",
" }\"\"\"\n",
"\n",
"##If you want to use a different date for your query, switch the \n",
"##commented/uncommented following lines\n",
"\n",
"variables={\n",
" 'date': datetime.datetime.strptime(date, '%Y%m%d').strftime('%Y-%m-%d')\n",
"# 'date': \"2016-10-08\"\n",
" }\n",
" \n",
"suspicious_request = makeGraphqlRequest(suspicious_query,variables)\n",
"\n",
"##The variable suspicious_request will contain the resulting data from the query.\n",
"results = suspicious_request['data']['proxy']['suspicious']\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##Pandas Dataframes\n",
"\n",
"The following cell loads the results into a pandas dataframe\n",
"\n",
"For more information on how to use pandas, you can learn more here: https://pandas.pydata.org/pandas-docs/stable/10min.html"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"df = pd.read_json(json.dumps(results))\n",
"##Printing only the selected column list from the dataframe\n",
"##Unless specified otherwise, \n",
"print df[['clientIp','uriQuery','datetime','clientToServerBytes','serverToClientBytes', 'host']]\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##Additional operations \n",
"\n",
"Additional operations can be performed on the dataframe like sorting the data, filtering it and grouping it\n",
"\n",
"**Filtering the data**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"##Filter results where the destination port = 3389\n",
"##The resulting data will be stored in df2 \n",
"\n",
"df2 = df[df['clientIp'].isin(['10.173.202.136'])]\n",
"print df2[['clientIp','uriQuery','datetime','host']]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Ordering the data**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"srtd = df.sort_values(by=\"host\")\n",
"print srtd[['host','clientIp','uriQuery','datetime']]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Grouping the data**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"## This command will group the results by pairs of source-destination IP\n",
"## summarizing all other columns \n",
"grpd = df.groupby(['clientIp','host']).sum()\n",
"## This will print the resulting dataframe displaying the input and output bytes columnns\n",
"print grpd[[\"clientToServerBytes\",\"serverToClientBytes\"]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Reset Scored Connections**\n",
"\n",
"Uncomment and execute the following cell to reset all scored connections for this day"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# reset_scores = \"\"\"mutation($date:SpotDateType!) {\n",
"# proxy{\n",
"# resetScoredConnections(date:$date){\n",
"# success\n",
"# }\n",
"# }\n",
"# }\"\"\"\n",
"\n",
"\n",
"# variables={\n",
"# 'date': datetime.datetime.strptime(date, '%Y%m%d').strftime('%Y-%m-%d')\n",
"# }\n",
" \n",
"# request = makeGraphqlRequest(reset_scores,variables)\n",
"\n",
"\n",
"# print request['data']['proxy']['resetScoredConnections']['success']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##Sandbox\n",
"\n",
"At this point you can perform your own analysis using the previously provided functions as a guide.\n",
"\n",
"Happy threat hunting!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#Your code here"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 0
}