Pandas Time-Series

Time-series data is generated sequentially with repeated measurements that typically occur at regular intervals. The pandas package collects and analyzes time-series data in a DataFrame.

Pandas reads stored files or data tables from a database. It produces a DataFrame as a table with headers and each column as a Series. The data source can either be local
pd.read_csv('data.csv')
or from an online source such as a URL.
pd.read_csv('http://apmonitor.com/dde/uploads/Main/tclab.txt')
There are additional read options in Pandas such as from the Clipboard, Excel, Feather, Flat file, Google BigQuery, HTML, JSON, HDFStore, Pickled Files, Parquet, ORC, SAS, SPSS, SQL Database, and STATA. There are also options to write a DataFrame to a CSV file or to the following data structures or online repositories.
dx.to_clipboard()
dx.to_csv()
dx.to_dict()
dx.to_excel()
dx.to_feather()
dx.to_gbq()
dx.to_hdf()
dx.to_html()
dx.to_json()
dx.to_latex()
dx.to_markdown()
dx.to_parquet()
dx.to_pickle()
dx.to_records()
dx.to_sql()
dx.to_stata()
dx.to_string()
Join Data
For time series data, the tables are joined to match features and labels at particular time points.
Generate DataFrames dx and dy
import pandas as pd
tx = np.linspace(0,1,4); x = np.cos(tx)
dx = pd.DataFrame({'Time':tx,'x':x})
ty = np.linspace(0,1,3); y = np.sin(ty)
dy = pd.DataFrame({'Time':ty,'y':y})
Time x 0 0.000000 1.000000 1 0.333333 0.944957 2 0.666667 0.785887 3 1.000000 0.540302 Time y 0 0.0 0.000000 1 0.5 0.479426 2 1.0 0.841471
The two dataframes are joined with an index ('Time') that is set as the key for comparing and merging.
dy = dy.set_index('Time')
Options for merging are to join only the entries of dy that match dx.
All missing entries have NaN for Not a Number. The default is a left join meaning that DataFrame z has the same index as x.
x y Time 0.000000 1.000000 0.000000 0.333333 0.944957 NaN 0.666667 0.785887 NaN 1.000000 0.540302 0.841471
Right Join
A right join results in the index of y.
x y Time 0.0 1.000000 0.000000 0.5 NaN 0.479426 1.0 0.540302 0.841471
Inner Join
An inner join gives only the rows with an index that is common to both DataFrames.
x y Time 0.0 1.000000 0.000000 1.0 0.540302 0.841471
Outer Join
An outer join with sort=True gives all of the entries from both DataFrames.
x y Time 0.000000 1.000000 0.000000 0.333333 0.944957 NaN 0.500000 NaN 0.479426 0.666667 0.785887 NaN 1.000000 0.540302 0.841471
Append Data
Multiple data sets can be combined with the .append() function. This is needed to combine multiple data sets with the same types of data but stored separately.
import pandas as pd
tx = np.linspace(0,1,4); x = np.cos(tx)
dx = pd.DataFrame({'Time':tx,'x':x})
tx = np.linspace(0,1,3)
x = np.cos(tx)
dy = pd.DataFrame({'Time':tx,'x':x})
In this case, both dx and dy have the same data but at different times.
Time x 0 0.000000 1.000000 1 0.333333 0.944957 2 0.666667 0.785887 3 1.000000 0.540302 Time x 0 0.0 1.000000 1 0.5 0.877583 2 1.0 0.540302
The functions append, sort_values, drop_duplicates, and reset_index are used to append, sort, consolidate, and reset the index of the table.
.sort_values(by='Time')\
.drop_duplicates(subset='Time')\
.reset_index(drop=True)
Time x 0 0.000000 1.000000 1 0.333333 0.944957 2 0.500000 0.877583 3 0.666667 0.785887 4 1.000000 0.540302
Activity

Complete two exercises with the Temperature Control Lab TCLab or TCLab Model (digital twin) TCLabModel. Collect data on time, heater 1 (Q1), heater 2 (Q2), temperature 1 (T1), and temperature 2 (T2).
1. Import stored data as a Pandas DataFrame.
2. Collect streaming data: add a data row to a Pandas DataFrame each second for 60 seconds. Change heater 1 (Q1) from 0 to 70% output at 5 seconds.
For each, generate a plot that shows the measured temperature and heater values on separate subplots. Add appropriate labels to the plots such as x-label and legend. Compute basic statistics such as Minimum, Maximum, Average, and Standard Deviation values. Show a table that summarizes the statistics.
Solution
✅ Knowledge Check
1. Which statement best describes the purpose of the pd.read_csv() function in pandas?
2. After joining two dataframes dx and dy with an outer join and the sort=True argument, which of the following is true about the resultant dataframe?