Maybe you want to avoid burning up a bunch of cash on , the cloud-based Spark machine learning and analytics platform. Or maybe you need a data source or language it doesn’t support. Or maybe you’re just more of an open source or roll-your-own kinda person? Enter Apache Zeppelin. It won InfoWorld Bossie awards and .
. It is a multi-user, multilanguage, multiplatform notebook for analytics and visualization. With Zeppelin, you can pull data from multiple sources (like Oracle, , and ) and analyze them with tools like . You can write some of your code in Scala, some in R, and some in Python (among others) and then visualize the results with pretty charts and stuff.
Zeppelin isn’t difficult to install, but if you want to get it running for multiple users on Amazon Web Services, you have to do a few steps. If you haven’t used AWS’s EC2 before check out my and . You can install Zeppelin on Windows or Linux, but I suggest you use Linux because it is a tad lighter weight, and you will find more community documentation.
Without further ado, let’s get started!
and click Binary Package with All Interpreters, which at this writing is
Now, copy the link under your suggested mirror.
Back in the Terminal where you SSHed to your instance, use
wget to get a copy of Zeppelin: enter
wget followed by the link you copied, and press Return. (You’ll be doing this as the “zeppelin” user if you’ve been following directions.)
After it downloads, untar Zeppelin. The command is
tar -xzf zeppelin-0.7.2-bin-all.tgz, then press Return. (Your filename may be different if a new version has come out.)
Now, create a softlink called
ls -l and press Return to see what the unarchived directory is called. Type
ln -s yourdirectoryname zeppelin-current to create the softlink. For me, this is
ln -s zeppelin-0.7.2-bin-all zeppelin-current.
Change directory to the
zeppelin-current directory. Type
cd zeppelin-current and press Return.
Now start Zeppelin. Type
bin/zeppelin-daemon.sh start and press Return.
Test that you can reach Zeppelin.From your browser, go to http://yourinstanceip:8080/zeppelin. You should see the welcome screen. If so, you’ve installed Zeppelin and didn’t misconfigure the EC2 security groups!
Step 6: Set up Zeppelin authentication
However, you’re currently a very powerful anonymous user. So enable authentication/multi-user.
To do so, stop Zeppelin. In the Terminal, type
bin/zeppelin-daemon.sh stop and press Return.
To secure Zeppelin, first copy the Apache Shiro (which is used for security) configuration template by typing
cp conf/shiro.ini.template conf/shiro.ini and pressing Return. Then copy the site configuration by typing
cp conf/zeppelin-site.xml.template conf/zeppelin-site.xml and editing it.
Your choice of editor is a big deal. If you’re an experienced Linux user like me, you’ll use the really great VI editor by typing
vi conf/zeppelin-site.xml and pressing Return in your instance. (Exit VI by pressing Esc.) If you’re new, you can use another editor called Nano by typing nano conf/zeppelin-site.xml and pressing Return. (Exit Nano by pressing Control-X.) My examples use VI, but you can replace
nano if you prefer Nano.
Now, disable anonymous access.From the editor, change the
zeppelin.anonymous.allowed property to
Consider making all notebooks private by default.To do this, change the
zeppelin.notebook.public property to
zeppelin-site.xml and exit the editor. In VI, this means pressing Esc and typing
:wq. In Nano, this means pressing Control-X and telling it you want to save when prompted.
Now, exit to the Ubuntu user. Earlier, you did
sudo su - zeppelin to change to the zeppelin user. Now you want out, so type
exit and press Return.
Create a startup configuration for Zeppelin. You want Zeppelin to be managed by Systemd so you can type
sudo service zeppelin start or
sudo service zeppelin stop and also have Zeppelin start when you start your EC2 instance. Type
sudo vi /etc/systemd/system/zeppelin.service and add the following content:
[Unit] Description=Service to run Zeppelin Daemon Documentation= [Service] User=zeppelin Group=zeppelin Type=forking WorkingDirectory=/home/zeppelin ExecStart=/home/zeppelin/zeppelin-current/bin/zeppelin-daemon.sh start ExecStop=/home/zeppelin/zeppelin-current/bin/zeppelin-daemon.sh stop [Install] WantedBy=multi-user.target
Then save and exit the editor.
Now you can enable the startup configuration.To do so, first ensure the configuration is loaded by typing
sudo systemctl daemon-reload and pressing Return. You’ll need to do that any time you change the
Next, enable the service to run at startup by typing
sudo systemctl enable zeppelin and pressing Return.
Step 7: Start Zeppelin
You can now start Zeppelin. Start Zeppelin as a daemon by typing
sudo service zeppelin start and pressing Return. Wait a minute or two.
Go to Zeppelin again by entering http://yourinstanceip:8080/zeppelin in your browser. When Zeppelin’s welcome screen appears, click Login. Log in with
admin as your username and
password1 as your password.
Step 8: Create a new notebook
To create a new notebook, choose Notebook > Create New Note from the menu at the top of the welcome screen. For this example, call your notebook “Spark Notebook.”
Now that you have a notebook, use it! Apache that I recommend you try out.
When done, shut down your instance in EC2. (Save your money!)
Next steps for using Zeppelin
Now that you have Zeppelin up and running, here are some of the things you may want to do with it:
- You may want to restart your instance and make sure that Zeppelin starts with the instance. If it doesn’t, type
journalctl -xeand press Return in the Terminal to see what went wrong.
- You might want to change the passwords, users or roles by editing
- You may want to check on the .
- Right now, you’re using plaintext users and passwords stored in
shiro.ini. If you have an LDAP server or some other authentication source, you could .
- You might want to . You’ll want a domain name for your server, which means getting a fixed for it as well.
- If you do have multiple users, you need to configure the EC2 security group with either their IP address or a range of IP addresses (check out my for an explanation on how to do this).