Pandas Time-Series

Time-series data is generated sequentially with repeated measurements that typically occur at regular intervals. The pandas package collects and analyzes time-series data in a DataFrame.

Pandas reads stored files or data tables from a database. It produces a DataFrame as a table with headers and each column as a Series. The data source can either be local

# local source
pd.read_csv('data.csv')

or from an online source such as a URL.

# online source
pd.read_csv('http://apmonitor.com/dde/uploads/Main/tclab.txt')

There are additional read options in Pandas such as from the Clipboard, Excel, Feather, Flat file, Google BigQuery, HTML, JSON, HDFStore, Pickled Files, Parquet, ORC, SAS, SPSS, SQL Database, and STATA. There are also options to write a DataFrame to a CSV file or to the following data structures or online repositories.

# write DataFrame dx
dx.to_clipboard()
dx.to_csv()
dx.to_dict()
dx.to_excel()
dx.to_feather()
dx.to_gbq()
dx.to_hdf()
dx.to_html()
dx.to_json()
dx.to_latex()
dx.to_markdown()
dx.to_parquet()
dx.to_pickle()
dx.to_records()
dx.to_sql()
dx.to_stata()
dx.to_string()

Join Data

For time series data, the tables are joined to match features and labels at particular time points.

Generate DataFrames dx and dy

import numpy as np
import pandas as pd

tx = np.linspace(0,1,4); x  = np.cos(tx)
dx = pd.DataFrame({'Time':tx,'x':x})

ty = np.linspace(0,1,3); y  = np.sin(ty)
dy = pd.DataFrame({'Time':ty,'y':y})
          Time         x
   0  0.000000  1.000000
   1  0.333333  0.944957
   2  0.666667  0.785887
   3  1.000000  0.540302

          Time         y
   0       0.0  0.000000
   1       0.5  0.479426
   2       1.0  0.841471

The two dataframes are joined with an index ('Time') that is set as the key for comparing and merging.

dx = dx.set_index('Time')
dy = dy.set_index('Time')

Options for merging are to join only the entries of dy that match dx.

z = dx.join(dy)

All missing entries have NaN for Not a Number. The default is a left join meaning that DataFrame z has the same index as x.

                    x         y
   Time                        
   0.000000  1.000000  0.000000
   0.333333  0.944957       NaN
   0.666667  0.785887       NaN
   1.000000  0.540302  0.841471

Right Join

A right join results in the index of y.

z = dx.join(dy,how='right')
                x         y
   Time                    
   0.0   1.000000  0.000000
   0.5        NaN  0.479426
   1.0   0.540302  0.841471

Inner Join

z = dx.join(dy,how='inner')

An inner join gives only the rows with an index that is common to both DataFrames.

                x         y
   Time                    
   0.0   1.000000  0.000000
   1.0   0.540302  0.841471

Outer Join

z = dx.join(dy,how='outer',sort=True)

An outer join with sort=True gives all of the entries from both DataFrames.

                    x         y
   Time                        
   0.000000  1.000000  0.000000
   0.333333  0.944957       NaN
   0.500000       NaN  0.479426
   0.666667  0.785887       NaN
   1.000000  0.540302  0.841471

Append Data

Multiple data sets can be combined with the .append() function. This is needed to combine multiple data sets with the same types of data but stored separately.

import numpy as np
import pandas as pd
tx = np.linspace(0,1,4); x  = np.cos(tx)
dx = pd.DataFrame({'Time':tx,'x':x})
tx = np.linspace(0,1,3)
x  = np.cos(tx)
dy = pd.DataFrame({'Time':tx,'x':x})

In this case, both dx and dy have the same data but at different times.

          Time         x
   0  0.000000  1.000000
   1  0.333333  0.944957
   2  0.666667  0.785887
   3  1.000000  0.540302

          Time         x
       0   0.0  1.000000
       1   0.5  0.877583
       2   1.0  0.540302

The functions append, sort_values, drop_duplicates, and reset_index are used to append, sort, consolidate, and reset the index of the table.

dx.append(dy)\
  .sort_values(by='Time')\
  .drop_duplicates(subset='Time')\
  .reset_index(drop=True)
          Time         x
   0  0.000000  1.000000
   1  0.333333  0.944957
   2  0.500000  0.877583
   3  0.666667  0.785887
   4  1.000000  0.540302

Activity

Complete two exercises with the Temperature Control Lab TCLab or TCLab Model (digital twin) TCLabModel. Collect data on time, heater 1 (Q1), heater 2 (Q2), temperature 1 (T1), and temperature 2 (T2).

1. Import stored data as a Pandas DataFrame.

2. Collect streaming data: add a data row to a Pandas DataFrame each second for 60 seconds. Change heater 1 (Q1) from 0 to 70% output at 5 seconds.

For each, generate a plot that shows the measured temperature and heater values on separate subplots. Add appropriate labels to the plots such as x-label and legend. Compute basic statistics such as Minimum, Maximum, Average, and Standard Deviation values. Show a table that summarizes the statistics.

Solution


✅ Knowledge Check

1. Which statement best describes the purpose of the pd.read_csv() function in pandas?

A. It writes the data in a DataFrame to a CSV file.
B. It reads data from a CSV file into a pandas DataFrame.
C. It merges two pandas DataFrames based on a common key.
D. It sorts the rows in a DataFrame based on a specified column.

2. After joining two dataframes dx and dy with an outer join and the sort=True argument, which of the following is true about the resultant dataframe?

A. Only the entries common to both dataframes will be present.
B. The resultant dataframe will have an index from dx.
C. The resultant dataframe will contain all entries from both dx and dy.
D. NaN values will be replaced by zeros.
Streaming Chatbot
💬