## Install Python Data Science Packages

Python is a high-level and general-purpose programming language with data science and machine learning packages. Use the video below to install on Windows, MacOS, or Linux. As a first step, install Python for Windows, MacOS, or Linux.

#### Install Python Packages

The power of Python is in the packages that are available either through the pip or conda package managers. This page is an overview of some of the best packages for machine learning and data science and how to install them.

We will explore the Python packages that are commonly used for data science and machine learning. You may need to install the packages from the terminal, Anaconda prompt, command prompt, or from the Jupyter Notebook. If you have multiple versions of Python or have specific dependencies then use an environment manager such as **pyenv**. For most users, a single installation is typically sufficient. The Python package manager **pip** has all of the packages (such as **gekko**) that we need for this course. If there is an administrative access error, install to the local profile with the **--user** flag.

#### Install Method #1

#### Install Method #2

Packages be installed from a Python script although this is not recommended.

pipmain(['install','gekko'])

#### List Package Version Numbers

Many of the modules come pre-packaged with distributions such as Anaconda. List the current packages and version numbers.

Package Version ---------------------------------- ------------------- anaconda-client 1.7.2 anaconda-navigator 1.10.0 anaconda-project 0.8.3 beautifulsoup4 4.9.3 conda 4.9.2 gekko 1.0.4

Additional packages for visualization, data science, and machine learning are listed below.

#### Beautiful Soup

Beautiful Soup is a Python package for extracting (scraping) information from web pages. It uses an HTML or XML parser (lxml) and functions for iterating, searching, and modifying the parse tree.

#### Gekko

Gekko provides an interface to gradient-based solvers for machine learning and optimization of mixed-integer, differential algebraic equations, and time series models. Gekko provides exact first and second derivatives through automatic differentiation and discretization with simultaneous or sequential methods.

#### Keras

Keras provides an interface for artificial neural networks. Keras acts as an interface for the TensorFlow library. Other backend packages were supported until version 2.4. TensorFlow is now the only backend and is installed separately with **pip install tensorflow**.

#### Matplotlib

The package matplotlib generates plots in Python.

#### Numpy

Numpy is a numerical computing package for mathematics, science, and engineering. Many data science packages use Numpy as a dependency.

#### OpenCV

OpenCV (Open Source Computer Vision Library) is a package for real-time computer vision and developed with support from Intel Research.

#### Pandas

Pandas visualizes and manipulates data tables. There are many functions that allow efficient manipulation for the preliminary steps of data analysis problems.

#### Plotly

Plotly renders interactive plots with HTML and JavaScript. Plotly Express is included with Plotly.

#### PyTorch

PyTorch enables deep learning, computer vision, and natural language processing. Development is led by Facebook's AI Research lab (FAIR).

#### Scikit-Learn

Scikit-Learn (or sklearn) includes a wide variety of classification, regression and clustering algorithms including neural network, support vector machine, random forest, gradient boosting, k-means clustering, and other supervised or unsupervised learning methods.

#### SciPy

SciPy is a general-purpose package for mathematics, science, and engineering and extends the base capabilities of NumPy.

#### Seaborn

Seaborn is built on matplotlib, and produces detailed plots in few lines of code.

#### Statsmodels

Statsmodels is a package for exploring data, estimating statistical models, and performing statistical tests. It include descriptive statistics, statistical tests, plotting functions, and result statistics.

#### Temperature Control Lab

The Temperature Control Lab is used throughout the course for hands-on activities such as the Learn Python and Data Science modules. Data can also be generated from a digital twin simulator if a TCLab device is not connected. Use **TCLabModel** to generate simulated data wherever **TCLab** is used to connect Python to the physical lab.

#### TensorFlow

TensorFlow is an open source machine learning platform with particular focus on training and inference of deep neural networks. Development is led by the Google Brain team.

#### XGBoost

XGBoost is an open-source in Python and other data science platforms for gradient boosting. Unique features include tree penalization, proportional leaf node shrinking, Newton boosting, and scalable computing architectures. It is frequently the tool of choice of winning teams for Kaggle machine learning competitions.