Access Instructions

User Account

To access the BigData@Polito laboratory you need an user account. Please contact the responsible person in your department, or fill in this form to request the credentials.

System Requirements

All services of the BigData@Polito laboratory are accessible only from inside the Politecnico network. Wi-Fi users might need to use the Politecnico VPN to access some Web interfaces.

To interact with the system you need to be authenticated to the BIGDATA.POLITO.IT Kerberos Realm. This is done automatically when using the Access Gateway, otherwise your system must be configured as a Kerberos client.

Kerberos Client Configuration 

Edit your Kerberos client configuration file, usually located in /etc/krb5.conf, to add the BigData@Polito servers:

[libdefaults]
 default_realm = BIGDATA.POLITO.IT
[realms]
BIGDATA.POLITO.IT = {
   kdc = ma1-bigdata.polito.it
   kdc = mb1-bigdata.polito.it
   admin_server = 
ma1-bigdata.polito.it
}

Verify you can obtain your Kerberos ticket issuing the command

$ kinit <your_username>

If the system is correctly configured you will receive no error message. You can verify the status of your ticket with the command

$ klist

 If you cannot obtain the ticket, check that the Kerberos KDCs can be reached from your terminal, or contact your IT support.

Web Browser Configuration

To use the Web Interfaces for the BigData Hadoop Services you must configure your browser to utilize the Simple and Protected GSS-API Negotiation (SPNEGO) mechanism.

HTTP SPNEGO is enabled by default on Safari and Google Chrome. Instructions on how to enable SPNEGO in Firefox and other browsers are available at the links below. The HTTP SPNEGO whitelist must include the ".polito.it" domain.

Accessing the BigData@Polito Hadoop Services

The primary access to the BigData@Polito laboratory is via the Access Gateway bigdatalab.polito.it. Connecting to

$ ssh your_username@bigdatalab.polito.it

you can interact with the HDFS file system, copy your data to/from the cluster, and submit your jobs. You can find here a list of available commands. 

Web Interfaces to the Hadoop Services

For more flexible access to the system, you can use one of the web interfaces for the services.

HDFS

https://ma1-bigdata.polito.it:50470/

https://mb1-bigdata.polito.it:50470/

YARN Job History https://ma1-bigdata.polito.it:19890/jobhistory/
YARN Resource Manager

https://ma1-bigdata.polito.it:8090/cluster/

https://mb1-bigdata.polito.it:8090/cluster/

Hue

https://bigdatalab.polito.it:8080/

 

 

 

  

 

 

 

 

 

 

For accessing the YARN Job History, the YARN Resource Manager and the HDFS web interfaces, you will need to get a kerberos ticket in your PC and your browser must support HTTP SPNEGO, as described above.

Security Information

All HTTPS and TLS connections to the BigData@Polito servers are signed by our BigData Lab Root CA. You can either add in your browsers security exceptions to all the servers involved, or you can decide to trust the BigData Lab Root CA installing, in order, its certificates on your system.

  1. BigData Lab Root CA certificate
  2. BigData Lab Intermediated CA certificate

Using your own Hadoop system

For complex development or special needs, e.g. large local disk space, we suggest you to deploy your own local Hadoop system and configure to act as a client for the BigData@Polito cluster.

The software currently used in on BigData@Polito is the Cloudera CDH 5.12.1 distribution. Consult the installation requirements and instructions for any information: you will probably need to perform an "unmanaged deployment" using the provided packages. If the current Cloudera version is more recent than the one on the cluster, follow the instructions about "Installing an Earlier Release".

You still need valid BigData@Polito credentials, the correct Kerberos configuration, and you must verify that AES-256 Encryption is enabled for your Java JDK. More information here.

After installing the software, you need to configure it to interact the BigData@Polito cluster, downloading the following Client Configuration files, and deploying them in the respective configuration directories:

Please notice that we cannot help you directly with the deployment of your personal Hadoop system, beside giving you the general information about the required software versions and providing you the Client configuration files: for any other problem you must contact your IT support.

Useful Resources