Domains per web users

This page collects the open datasets used in the papers:

  1. Luca Vassio, Danilo Giordano, Martino Trevisan, Marco Mellia, Ana Paula Couto da Silva, Users' Fingerprinting Techniques from TCP TrafficACM SIGCOMM Workshop on Big Data Analytics and Machine Learning for Data Communication Networks, Los Angeles, USA, August 2017
  2. Luca Vassio, Flavio Figuereido, Ana Paula Couto da Silva, Marco Mellia, Jussara Almeida, Mining and Modeling Web Trajectories from Passive TracesIEEE BigData 2017 DS4N, Boston, MA, December 2017

 -----------------------------------------------------------------------------------------------------------------------------------------------

1

The anonymized visited domains and the list of core domains used to perform the experiments are reported.

The dataset with the visited domain can be donwloaded from here and is composed by 4 columns:

1: The Client IP address anonymized

2: The timestamp the flow was generated in seconds

3: The Domain anonymized as a number between 000001 and 500k 

4: The a flag stating if the Domain is a Core Domain (True) or a Support Domain (False) 

 

The list of 1000 Core Domains can be donwloaded from here and is composed by 2 columns: 

1: The domain

2: If the domain is a Core Domain (Core) or a Support Domain (Support) 

For more details, please check the paper or contact us.

 

 ---------------------------------------------------------------------------------------------------------------------------------

2

Anonymized trajectories of domains and their TribeFlow models are reported.

The dataset with the visited domain can be donwloaded from here and is composed by 4 columns:

1: Timestamp in seconds

2: The Client IP address anonymized

3: The original Domain anonymized as a integer number 

4: The landing Domain anonymized as a  integer number 

The Tribelow models:

1: Campus model (download here)