DNS Labeled Data Set

An IXIA BreakingPoint box was used to simulate both normal and attack (DNS tunnelling) DNS traffic. The resulting pcaps were obtained and fields relevant to Apache Spot (incubating) were ingested and stored in parquet format. The attacks can be differentiated from the normal activity due to codes that were inserted into the Transaction ID field (upon ingestion: ‘dns_id’) which identifies either the fact that the traffic was normal or identifies the specific DNS tunneling activity being used. We provide the data schema as well as the location and specifications of the data within Amazon-S3. Information is also provided for how to interpret the dns_id field.

Data Schema

The schema for this data includes one field (called ‘dns_id’) in addition to what is usually used for DNS data in Apache Spot (incubating). The schema is as follows (see: http://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml#dns-parameters-6 for more information):

NameTypeDescription
frame_timestringTime of packet capture (UTC)
unix_tstampbigintTime of packet capture (UNIX time)
frame_lenintEntire packet length
ip_dststringIP address making the DNS query
ip_srcstringIP address of the DNS server
dns_qry_namestringResource record being queried, ex: ‘google.com’
dns_qry_classstringClass of DNS record, ex: ‘0x00000001’ (for Internet)
dns_qry_typeintType of resource record, ex: 1 (for a host address)
dns_qry_rcodeintError code for the results of the query, ex: 0 (for No Error)
dns_astringAnswer to the query
dns_idstringHexidecimal code inserting as the transaction ID used to differentiate normal queries from tunnelling (more details below)

Interpreting dns_id

The value of dns_id indicates that either the data row was taken from a packet capture of simulated normal DNS traffic, or from a packet capture of a particular type of simulated DNS tunnelling.

Within BreakingPoint, Transaction IDs are represented as a decimal number. However, tshark dissects the transaction ID in its hexadecimal representation (the format contained within parenthesis in the table below).

Within Apache Spot (incubating), only responses from DNS servers are ingested since the response packet contains the query made by the client and the response from the server in the same packet.

Super Flow NameTransaction IDDescription
Brandon_DNS_domain_Test1008 (0x000003f0)[Normal] This super flow simulates normal DNS queries.
DNS_Tunnel_BE_11002 (0x000003ea)[Attack] This super flow simulates a message being tunneled over DNS via the query name field (url's are random strings), with a ip address response (drawn from a file of randomly generated IPs) being sent via the DNS answer field.
DNS_Tunnel_BE_21003 (0x000003eb})[Attack] This super flow simulates a message being tunneled over DNS via the query name field (url's random strings), with a response being given as no such url found.
TCP_DNS_Tunnel_BE_11001 (0x000003e9)[Attack] This super Flow simulates tunneling random noise using TCP over DNS. The payload is generated by a Markov Dictionary and encoded in the DNS requests (responses) by using hex0x20Hack encoding.
TCP_DNS_Tunnel_BE_21005 (0x000003ed)[Attack] This super Flow simulates tunneling random noise using TCP over DNS. The payload is generated by a Markov Dictionary and encoded in the DNS requests (responses) by using Base16Alpha encoding.
TCP_DNS_Tunnel_BE_31007 (0x000003ef)[Attack] This super Flow simulates tunneling random noise using TCP over DNS. The payload is generated by a Markov Dictionary and encoded in the DNS requests (responses) by using Base63 encoding.

Data Location

FileDescriptionLocationSize
20170509_parquet.tar.gzTarball of directory (parquet files) simulated May 9, 2017https://s3-us-west-2.amazonaws.com/apachespot/public_data_sets/dns_labeled_data/20170509_parquet.tar.gz5.3G
20170509_DATA_SPEC.mdMark down document (the one you are reading now)https://s3-us-west-2.amazonaws.com/apachespot/public_data_sets/dns_labeled_data/20170509_DATA_SPEC.md4K

Number of Rows associated to each Value of dns_id in the Data File

Simulation DateTotal Recordsdns_id=1008dns_id=1002dns_id=1003dns_id=1001dns_id=1005dns_id=1007
5/9/2017391,364,387391,314,47716,31721,6664,1562,7435,028