Section overview:
- First things first: Basic toolbox
- Managing Python Versions: pyenv
- Managing Libraries: pip
- Managing Environments: pipenv
- Jupyter lab & virtual environments
Intro
Setting up your work environment from scratch might be something you do every three, four or five years, depending on how often you get a new machine. After that it’s mainly upgrading, bit the core tools (git, brew etc) are there, and your system variables are meant to work.
However, whenever we plan for a Python related workshop where we don’t know exactly what hardware to expect on the side of the participants or when we have a standalone app we want to share (e.g. an Electron App with a Python Backend) the larger setup of a systems becomes important.
In the end, if a similar environment is critical for a workshop and we don’t want to spend half a day with getting the equipment running, my first choice would still be an online tool (e.g. Google’s Collaboratory - which is based on open source Jupyter Notebooks), or – if it’s about demonstrating an application – maybe a Cloud service (e.g. Herroku) or AWS in combination with a dockerized solution.
Anyway, these approaches have their own disadvantages and sometimes it’s easier to list the steps to recreate similar working environments and point to the caveats. The following workflow is tested on OS X Yosemite (Version 10.10.15) and ad Mojave (10.14.6) with no major differences as far as the installations are concerned. Realpython also published a comprehensive overview including Windows Terminals (for command line tools) and Conda (a package and environment management system as part of Anaconda, a Python distribution focused on scientific applications).
Getting a toolbox ...
So before we get to install new stuff, it’s a safe step to check what we already have. So a typical Mac would come with Python 2.7 (owned by the Mac OS). That’s it. What we need, however, is ..
- Homebrew, a Package Manager for macOS this link lists a curl statement you can copy and paste in your terminal
- via homebrew we then get the latest version of Python3 as well as pip3
$ brew install python3
- as just outlined, we get pip as a courtesy from python3 – still, good to know that pip is a Package Manager for Python, managing additional libraries and dependencies that are not distributed as part of the standard library
- git, a Version Control System
- shell configuration via ~/.bash_profile including aliases and PATH management
- for a controlled development environment, we also want pipenv for project-specific libraries and pyenv for project-specific python versions
Managing Python Versions: pyenv
Managing Python versions becomes important when we want to run or further develop applications that have been written using a different Python Version than the one, we have installed. This isn’t necessarily a problem if we talk about backwards compatible minor versions, however changing from Python 2 to 3 can require a major rewrite.
How do we get pyenv, especially the latest version, which, at the time of writing was 1.2.26 (see pyenv --version
). The first option pip install pyenv
only gets you up to 1.2.21 which includes Python 3.7 but not 3.9.x. So we have to git clone the pyenv github repository into /Users/username/.pyenv and add a few lines to .bash_profile as described in the README of the repo. If you don't need the very last Python version, you are just fine with the pip install method.
Following some key commands for pyenv
#.. listing all possible installation including Anaconda, micropython, pypy, miniconda, jython etc
pyenv install --list
# installing Python 3.9.4 and making it the global version
$ pyenv install 3.9.4
$ pyenv global 3.9.4
# listing all versions installed
$ pyenv versions
# checking where Python executables are located
type -a python
# results in something like ..
python is /Users/me/.pyenv/shims/python
python is /usr/local/bin/python
python is /usr/bin/python
It’s not necessarily needed if we use pipenv
as described in the next section. But we can also cd into a new folder and define a local Python version via
pyenv local 3.7.10`.
This creates a file .python-version in folder. When using pyenv running the following in a terminal is paramount ...
# Add pyenv-virtualenv initializer to shell startup script
echo 'eval "$(pyenv virtualenv-init -)"' >> ~/.bash_profile
# Reload your profile
source ~/.bash_profile
Managing Libraries: pip
Before we can talk about pipenv
we need to look into pip
. Pip is a recursive acronym for ‘Pip installs packages’ and connects to an online repository (PyPI – Python Package Index). By default, packages are installed to the *running Python installation's site-packages directory. So it can happen that after changing the Python version, we need to install the required packages again to the new site-packages folder.
# pip list -v ... list available packages (verbose)
Package Version Location Installer
---------- ------- -------------------------------------- ---------
numpy 1.20.2 /usr/local/lib/python3.9/site-packages pip
pip 21.1 /usr/local/lib/python3.9/site-packages pip
setuptools 51.0.0 /usr/local/lib/python3.9/site-packages
wheel 0.36.1 /usr/local/lib/python3.9/site-packages
# pip list -o ... list **o**utdated packages (example taken from T-Test blog post)
Package Version Latest Type
------------ --------- --------- -----
certifi 2020.11.8 2020.12.5 wheel
matplotlib 3.3.3 3.4.1 wheel
numpy 1.19.4 1.20.2 wheel
outdated 0.2.0 0.2.1 wheel
pandas 1.1.4 1.2.4 wheel
pingouin 0.3.8 0.3.11 sdist
pip 21.0.1 21.1 wheel
scikit-learn 0.23.2 0.24.1 wheel
Any update would work similar to this pip install ipywidgets --upgrade
After all, we can also check if a specific packages are installed (e.g. related to a jupyter notebook), with pip list | grep jup*
Managing Environments: pipenv
Pipenv handles virtual environments and package dependencies. For a closer look into packages and modules have a look here. Unlike using pip in combination with virtualenv, pipenv uses one tool to manage dependencies and creating isolated virtual environments. It also auto-updates the Pipfiles, explained in more detail below.
If we need a virtual environment with a specific python version we can get this through pipenv --python 3.6
. If that happens at a later stage, when we have also some additional packages installed. We can delete the environment pipenv --rm
and simply rebuild it pipenv install
, which will automatically pick up the information provided in Pipfile.
If you have pyenv installed, Pipenv will ask you if you want to install a required version of Python if it’s not available yet.
# the following commands are largely self-explanatory ...
$ pipenv install pytest --dev
$ pipenv update [package-name]
$ pipenv uninstall [package-name]
Every change is also automatically reflected in the Pipfile and Pipfile.lock. The essential difference between both files is that * Pipfile* describes a working project set-up (i.e. library version should be more recent than 3.2.1 or could be any version *). Pipfile.lock is more precise in the sense that it locks in a specific version of every library, i.e. the one currently installed.
# A possible *Pipfile* structure for a generic machine learning project
[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true
[dev-packages]
pytest = "*"
[packages]
scikit-learn = "*"
pandas = "*"
plotly-express = "*"
numpy = "*"
plotly = "*"
[requires]
python_version = "3.8"
If we want to see what packages have been installed in our virtual environment, we can use pip list -v
again or, if we want to see a tree structure, pipenv graph
.
Another useful command is pipenv check
, which highlights vulnerabilities, suggesting a need to update affected packages.
# a possible output could be ...
Checking installed package safety...
39611: pyyaml <5.4 resolved (5.3.1 installed)!
A vulnerability was discovered in the PyYAML library in versions before 5.4, where it is susceptible to arbitrary code execution
Last but not least, the following will generate a requirments.txt file out of your Pipfile: $ pipenv lock -r --dev > requirements.txt
. Requirements are often needed by app hosting platforms such as Heroku, but since Pipfiles are meant to replace requirments.txt, they are (almost always?) accepted as alternatives.
Jupyter lab and virtual environments (kernels)
Effective environment management saves time and allows developers to create an isolated software product such that collaborators or contributors can recreate your environment and run your code. This, of course, also applies to jupyter notebooks. You can find a more extensive wrap up how to use jupyter notebooks here or here.
Pipenv, as introduced above, provides a standardized way to install project dependencies and testing and development requirements. Jupyter Lab is mainly a browser-based, very interactive development environment, which you get via pip install jupyterlab
and getting started via jupyter lab
.
Default starting folder is ’/tree/’. If you prefer a customized path, you need to go through the following steps:
- run
jupyter notebook --generate-config
- this generates a file to /Users/username/.jupyter/jupyter_notebook_config.py
- cd into that folder so you can edit the config file
- search for the following line in the file: #c.NotebookApp.notebook_dir = '' and replace ir with c.NotebookApp.notebook_dir = '/the/path/to/desired/folder/'
Next we want to reuse our pipenv configurations by installing the necessary kernel:
# 1st step
# cd into project folder and activate the virtual environment
pipenv shell
# 2nd step
pipenv install ipykernel
# 3rd step
# ml_scikitc can be replaced by any name of your choosing
python -m ipykernel install --user --display-name ml_scikit --name ml_scikit
As a result, you should be able to run your notebook with that specific kernel (drop down menu – right upper corner).
Finally, a great feature in notebooks are extensions. You can first check out a list of available extensions.The Github Repo summarizes five steps for installing extensions:
# install notebook extensions
pip install jupyter_contrib_nbextensions
# copying extensions into jupyter server directories & configuration
jupyter contrib nbextension install –user
# installing the configurator
pip install jupyter_nbextensions_configurator
# Configuring the notebook server to load the server extension
jupyter nbextensions_configurator enable –user
# Restart the server with ‘jupyter notebook’ & then select extensions under the ‘NBextensions’ menu (this process is slightly different when using ‘jupyter lab’)
Additional Links
Introduction to Anaconda https://realpython.com/python-windows-machine-learning-setup/
Pyenv and Shims https://mungingdata.com/python/how-pyenv-works-shims/