Data Schema for the Spot Suspicious Connects Analyses
Three kinds of network logs can be analyzed by the Spot Suspicious Connects analyses: Netflow records, DNS queries, and proxy logs.
Data from these logs is expected to have been stored in parquet files on HDFS with column structure extending the following schema.
- trhour: The hour of the day in which the flow was logged, Integer.
- sip: The source IP address of the flow, String.
- dip: The destination IP address of the flow, String.
- sport: The source port of the flow, Integer
- dport: The destination port of the flow, Integer
- proto: The protocol used by the flow, String
- ipkt: The flow's packet count, Long.
- ibyt: The flow's byte count, Long.
- frame_time: Frame timestamp of query, String.
- unix_tstamp: Unix timestamp of query, String.
- frame_len: Frame length, Integer.
- ip_dst: IP address of client making query, String.
- dns_qry_name: Name of DNS query, String.
- dns_qry_class: Class of DNS query, String.
- dns_qry_type: Type of DNS query, Integer.
- dns_qry_rcode: DNS query response code, Integer.
- p_date: Date of the query, String.
- p_time: Time of the query, String. (FORMAT?)
- clientip: IP address of client making proxy request, String.
- host: Host of request, String.
- reqmethod: Request method, String.
- useragent: User agent, String.
- resconttype: Response content type, String.
- respcode: Response code, String.
- fulluri: Full URI of request, String.