Skip to content

Latest commit

 

History

History
110 lines (81 loc) · 6.54 KB

File metadata and controls

110 lines (81 loc) · 6.54 KB
title Script action - Install Python packages with Jupyter on Azure HDInsight | Microsoft Docs
description Step-by-step instructions on how to use script action to configure Jupyter notebooks available with HDInsight Spark clusters to use external python packages.
services hdinsight
documentationcenter
author nitinme
manager jhubbard
editor cgronlun
tags azure-portal
ms.assetid 21978b71-eb53-480b-a3d1-c5d428a7eb5b
ms.service hdinsight
ms.custom hdinsightactive
ms.workload big-data
ms.tgt_pltfrm na
ms.devlang na
ms.topic article
ms.date 06/29/2017
ms.author nitinme

Use Script Action to install external Python packages for Jupyter notebooks in Apache Spark clusters on HDInsight

[!div class="op_single_selector"]

Learn how to use Script Actions to configure an Apache Spark cluster on HDInsight (Linux) to use external, community-contributed python packages that are not included out-of-the-box in the cluster.

Note

You can also configure a Jupyter notebook by using %%configure magic to use external packages. For instructions, see Use external packages with Jupyter notebooks in Apache Spark clusters on HDInsight.

You can search the package index for the complete list of packages that are available. You can also get a list of available packages from other sources. For example, you can install packages made available through Anaconda or conda-forge.

In this article, you will learn how to install the TensorFlow package using Script Actoin on your cluster and use it via the Jupyter notebook.

Prerequisites

You must have the following:

Use external packages with Jupyter notebooks

  1. From the Azure Portal, from the startboard, click the tile for your Spark cluster (if you pinned it to the startboard). You can also navigate to your cluster under Browse All > HDInsight Clusters.

  2. From the Spark cluster blade, click Script Actions under Usage. Run the custom action that installs TensorFlow in the head nodes and the worker nodes. The bash script can be referenced from: https://hdiconfigactions.blob.core.windows.net/linuxtensorflow/tensorflowinstall.sh Visit the documentation on how to use custom script actions.

    [!NOTE] There are two python installations in the cluster. Spark will use the Anaconda python installation located at /usr/bin/anaconda/bin. Reference that installation in your custom actions via /usr/bin/anaconda/bin/pip and /usr/bin/anaconda/bin/conda.

  3. Open a PySpark Jupyter notebook

    Create a new Jupyter notebook

  4. A new notebook is created and opened with the name Untitled.pynb. Click the notebook name at the top, and enter a friendly name.

    Provide a name for the notebook

  5. You will now import tensorflow and run a hello world example.

    Code to copy:

     import tensorflow as tf
     hello = tf.constant('Hello, TensorFlow!')
     sess = tf.Session()
     print(sess.run(hello))
    

    The result will look like this:

    TensorFlow code execution

See also

Scenarios

Create and run applications

Tools and extensions

Manage resources