Main

Data Analysis with Python

A common task for scientists and engineers is to analyze data from an external source that may be in a text or comma separated value (CSV) format. By importing the data into Python, data analysis such as statistics, trending, or calculations can be made to synthesize the information into relevant and actionable information. Tutorials below demonstrate how to import data (including online data), perform a basic analysis, trend the results, and export the results to another text file. Two examples are provided with Numpy and Pandas. Script files of the Python source code with sample data are below.


Import Data and Analyze with Numpy

# Numpy (data import, manipulation, export)
import numpy as np
# Matplotlib (create trends)
import matplotlib.pyplot as plt

# load the data file
data_file = np.genfromtxt('data_file.txt', delimiter=',')

# create time vector from imported data (starts from index 0)
time = data_file[:,0]
# parse good sensor data from imported data
sensors = data_file[:,1:5]

# display the first 6 sensor rows
print(sensors[0:6])

# adjust time to start at zero by subtracting the
#  first element in the time vector (index = 0)
time = time - time[0]

# calculate the average of the sensor readings
avg = np.mean(sensors,1) # over the 2nd dimension

# export data
# stack time and avg as column vectors
my_data = np.vstack((time,sensors.T,avg))
# transpose data
my_data = my_data.T
# save text file with comma delimiter
np.savetxt('export_from_python.txt',my_data,delimiter=',')

# generate a figure
plt.figure(1)
plt.plot(time/60.0,sensors[:,1],'ro')
plt.plot(time/60.0,avg,'b.')
# add text labels to the plot
plt.legend(['Sensor 2','Average Sensors 1-4'])
plt.xlabel('Time (min)')
plt.ylabel('Sensor Values')
# save the figure as a PNG file
plt.savefig('my_Python_plot.png')
# show the figure on the screen (pauses execution until closed)
plt.show()

Import Data and Analyze with Pandas

# import Numpy, Pandas, and Matplotlib
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# load the data file
data_file = pd.read_csv('data_with_headers.csv')

# create time vector from imported data
time = data_file['time']
# parse good sensor data from imported data
sensors = data_file.ix[:,'s1':'s4']

# display the first 6 sensor rows
print(sensors[0:6])

# adjust time to start at zero by subtracting the
#  first element in the time vector (index = 0)
time = time - time[0]

# calculate the average of the sensor readings
avg = np.mean(sensors,1) # over the 2nd dimension

# export data
my_data = [time, sensors, avg]
result = pd.concat(my_data,axis=1)

result.to_csv('result.csv')
#result.to_excel('result.xlsx')
result.to_html('result.htm')
result.to_clipboard()

# generate a figure
plt.figure(1)
plt.plot(time,sensors['s1'],'r-')
plt.plot(time,avg,'b.')
# add text labels to the plot
plt.legend(['Sensor 2','Average'])
plt.xlabel('Time (sec)')
plt.ylabel('Sensor Values')
# save the figure as a PNG file
plt.savefig('my_Python_plot.png')
# show the figure on the screen
plt.show()

Import Online Data and Analyze

Below is an example of pulling data from an Internet source, such as financial information about a stock. The example shows how to request, parse, and display the financial data.

import pandas as pd
from matplotlib.finance import *
import matplotlib.pyplot as plt
try:
    import wget
except:
    # install wget if needed
    import pip
    pip.main(['install','wget'])
    import wget

# stock ticker symbol
stock = 'GOOGL'
url = 'http://chart.finance.yahoo.com/table.csv?s='+stock
filename = wget.download(url)

# rename file
from shutil import move
move(filename,stock.lower()+'.csv')

# import data with pandas
data = pd.read_csv(stock+'.csv')
print(data['Close'][0:5])
print('min: '+str(min(data['Close'][0:30])))
print('max: '+str(max(data['Close'][0:30])))

# plot data with pyplot
plt.figure()
plt.plot(data['Open'][0:30])
plt.plot(data['Close'][0:30])
plt.xlabel('days ago')
plt.ylabel('price')
plt.show()

This tutorial can also be completed with Excel and Matlab. Click on the appropriate link for additional information.


comments powered by Disqus