Data Analysis with Python

Main.PythonDataAnalysis History

Hide minor edits - Show changes to markup

June 21, 2020, at 04:14 AM by 136.36.211.159 -
Deleted lines 237-255:

(:html:)

 <div id="disqus_thread"></div>
    <script type="text/javascript">
        /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */
        var disqus_shortname = 'apmonitor'; // required: replace example with your forum shortname

        /* * * DON'T EDIT BELOW THIS LINE * * */
        (function() {
            var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
            dsq.src = 'https://' + disqus_shortname + '.disqus.com/embed.js';
            (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
        })();
    </script>
    <noscript>Please enable JavaScript to view the <a href="https://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript>
    <a href="https://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a>

(:htmlend:)

August 15, 2019, at 02:38 AM by 50.204.38.62 -
Changed lines 95-96 from:

sensors = data_file.ix[:,'s1':'s4']

to:

sensors = data_file.loc[:, 's1':'s4']

Added line 99:
  1. or use: print(sensors.head(6))
March 08, 2019, at 03:13 AM by 10.37.149.103 -
Changed lines 110-114 from:
  1. column names

cn = result.columns.values cn[-1] = 'avg' # change last column result.columns = cn

to:

result.columns.values[-1] = 'avg'

March 08, 2019, at 03:12 AM by 10.37.149.103 -
Changed lines 89-90 from:

data_file = pd.read_csv('data_with_headers.csv')

to:

url='http://apmonitor.com/che263/uploads/Main/data_with_headers.txt' data_file = pd.read_csv(url)

Added lines 110-114:
  1. column names

cn = result.columns.values cn[-1] = 'avg' # change last column result.columns = cn

March 08, 2019, at 03:08 AM by 10.37.149.103 -
Changed line 72 from:
to:
March 08, 2019, at 03:07 AM by 10.37.149.103 -
Added lines 71-72:
November 13, 2018, at 11:18 AM by 174.148.17.88 -
Changed line 165 from:

Once the data is imported, it can be analyzed with many different tools such as machine learning algorithms. Below is an example of using the data for analysis of correlation between open and close price of Google stock.

to:

Once the data is imported, it can be analyzed with many different tools such as machine learning algorithms. Below is an example of using the data for analysis of correlation between open and close price of Google publicly traded shares.

November 13, 2018, at 11:17 AM by 174.148.17.88 -
Added lines 167-168:
Deleted lines 228-229:
November 13, 2018, at 11:16 AM by 174.148.17.88 -
Added lines 164-228:

Once the data is imported, it can be analyzed with many different tools such as machine learning algorithms. Below is an example of using the data for analysis of correlation between open and close price of Google stock.

(:toggle hide online2 button show="Show Python Regression Analysis":) (:div id=online2:) (:source lang=python:) from gekko import GEKKO import numpy as np import matplotlib.pyplot as plt import pandas as pd

  1. Google stock

url = 'https://apmonitor.com/che263/uploads/Main/goog.csv'

  1. import data with pandas

data = pd.read_csv(url) print(data['Close'][0:5]) print('min: '+str(min(data['Close'][0:20]))) print('max: '+str(max(data['Close'][0:20])))

  1. GEKKO model

m = GEKKO()

  1. input data

x = m.Param(value=np.array(data['Open']))

  1. parameters to optimize

a = m.FV() a.STATUS=1 b = m.FV() b.STATUS=1 c = m.FV() c.STATUS=1

  1. variables

y = m.CV(value=np.array(data['Close'])) y.FSTATUS=1

  1. regression equation

m.Equation(y==b*m.exp(a*x)+c)

  1. regression mode

m.options.IMODE = 2

  1. optimize

m.options.solver = 1 m.solve(disp=True)

  1. print parameters

print('Optimized, a = ' + str(a.value[0])) print('Optimized, b = ' + str(b.value[0])) print('Optimized, c = ' + str(c.value[0]))

  1. plot data

plt.figure() plt.plot(data['Open'],data['Close'],'ro',label='Stock Data') plt.plot(x.value,y.value,'bx',label='Predicted') plt.xlabel('Open Price') plt.ylabel('Close Price') plt.legend() plt.show() (:sourceend:) (:divend:)

May 08, 2018, at 12:58 PM by 45.56.3.173 -
Changed lines 145-152 from:

try:

    import wget

except:

    # install wget if needed
    import pip
    pip.main(['install','wget'])
    import wget
to:
Deleted line 146:

stock = 'GOOG'

Changed lines 148-154 from:
  1. url = 'https://chart.finance.yahoo.com/table.csv?s='+stock

filename = wget.download(url)

  1. rename file

from shutil import move move(filename,stock.lower()+'.csv')

to:
Changed line 150 from:

data = pd.read_csv(stock+'.csv')

to:

data = pd.read_csv(url)

Deleted line 143:

from matplotlib.finance import *

Changed lines 154-155 from:

stock = 'GOOGL' url = 'https://chart.finance.yahoo.com/table.csv?s='+stock

to:

stock = 'GOOG' url = 'https://apmonitor.com/che263/uploads/Main/goog.csv'

  1. url = 'https://chart.finance.yahoo.com/table.csv?s='+stock
Changed lines 166-168 from:

print('min: '+str(min(data['Close'][0:30]))) print('max: '+str(max(data['Close'][0:30])))

to:

print('min: '+str(min(data['Close'][0:20]))) print('max: '+str(max(data['Close'][0:20])))

Changed lines 171-172 from:

plt.plot(data['Open'][0:30]) plt.plot(data['Close'][0:30])

to:

plt.plot(data['Open'][0:20]) plt.plot(data['Close'][0:20])

Deleted lines 73-74:
Added lines 133-134:
Added lines 73-74:
September 08, 2017, at 05:26 PM by 174.148.248.209 -
Changed line 16 from:

<iframe width="560" height="315" src="//www.youtube.com/embed/Tq6rCWPdXoQ" frameborder="0" allowfullscreen></iframe>

to:

<iframe width="560" height="315" src="https://www.youtube.com/embed/pQv6zMlYJ0A" frameborder="0" allowfullscreen></iframe>

Changed lines 5-7 from:

Python Data Analysis

A common task for scientists and engineers is to analyze data from an external source that may be in a text or comma separated value (CSV) format. By importing the data into Python, data analysis such as statistics, trending, or calculations can be made to synthesize the information into relevant and actionable information. This tutorial demonstrates how to import data, perform a basic analysis, trend the results, and export the results to another text file. Two examples are provided with Numpy and Pandas. Script files of the Python source code with sample data are below.

to:

A common task for scientists and engineers is to analyze data from an external source that may be in a text or comma separated value (CSV) format. By importing the data into Python, data analysis such as statistics, trending, or calculations can be made to synthesize the information into relevant and actionable information. Tutorials below demonstrate how to import data (including online data), perform a basic analysis, trend the results, and export the results to another text file. Two examples are provided with Numpy and Pandas. Script files of the Python source code with sample data are below.

Added lines 9-10:

Added lines 70-71:

Added lines 130-131:

Added lines 178-179:

Changed lines 134-135 from:

(:toggle hide pandas button show="Show Python Source":) (:div id=pandas:)

to:

(:toggle hide online button show="Show Python Source":) (:div id=online:)

Added lines 15-18:

(:html:) <iframe width="560" height="315" src="//www.youtube.com/embed/Tq6rCWPdXoQ" frameborder="0" allowfullscreen></iframe> (:htmlend:)

Added lines 68-71:

Import Data and Analyze with Pandas

Changed line 73 from:

<iframe width="560" height="315" src="//www.youtube.com/embed/Tq6rCWPdXoQ" frameborder="0" allowfullscreen></iframe>

to:

<iframe width="560" height="315" src="https://www.youtube.com/embed/FXhED53VZ50" frameborder="0" allowfullscreen></iframe>

Deleted lines 75-82:

Import Data and Analyze with Pandas

(:html:) <iframe width="560" height="315" src="https://www.youtube.com/embed/FXhED53VZ50" frameborder="0" allowfullscreen></iframe> (:htmlend:)

Added lines 134-135:

(:toggle hide pandas button show="Show Python Source":) (:div id=pandas:)

Added line 171:

(:divend:)

Changed line 15 from:

(:toggle hide numpy button show="Show Solution":)

to:

(:toggle hide numpy button show="Show Python (NumPy) Source":)

Changed line 70 from:
to:
Changed lines 11-24 from:

(:html:) <iframe width="560" height="315" src="//www.youtube.com/embed/Tq6rCWPdXoQ" frameborder="0" allowfullscreen></iframe> (:htmlend:)

Import Data and Analyze with Pandas

(:html:) <iframe width="560" height="315" src="https://www.youtube.com/embed/FXhED53VZ50" frameborder="0" allowfullscreen></iframe> (:htmlend:)

to:

(:toggle hide numpy button show="Show Solution":) (:div id=numpy:)

Changed line 18 from:
  1. import Numpy, Pandas, and Matplotlib
to:
  1. Numpy (data import, manipulation, export)
Changed line 20 from:

import pandas as pd

to:
  1. Matplotlib (create trends)
Changed lines 24-27 from:

data_file = pd.read_csv('data_with_headers.csv')

  1. create time vector from imported data

time = data_file['time']

to:

data_file = np.genfromtxt('data_file.txt', delimiter=',')

  1. create time vector from imported data (starts from index 0)

time = data_file[:,0]

Changed lines 29-30 from:

sensors = data_file.ix[:,'s1':'s4']

to:

sensors = data_file[:,1:5]

Changed lines 42-49 from:

my_data = [time, sensors, avg] result = pd.concat(my_data,axis=1)

result.to_csv('result.csv')

  1. result.to_excel('result.xlsx')

result.to_html('result.htm') result.to_clipboard()

to:
  1. stack time and avg as column vectors

my_data = np.vstack((time,sensors.T,avg))

  1. transpose data

my_data = my_data.T

  1. save text file with comma delimiter

np.savetxt('export_from_python.txt',my_data,delimiter=',')

Changed lines 51-52 from:

plt.plot(time,sensors['s1'],'r-') plt.plot(time,avg,'b.')

to:

plt.plot(time/60.0,sensors[:,1],'ro') plt.plot(time/60.0,avg,'b.')

Changed lines 54-55 from:

plt.legend(['Sensor 2','Average']) plt.xlabel('Time (sec)')

to:

plt.legend(['Sensor 2','Average Sensors 1-4']) plt.xlabel('Time (min)')

Changed line 59 from:
  1. show the figure on the screen
to:
  1. show the figure on the screen (pauses execution until closed)
Added lines 62-124:

(:divend:)

(:html:) <iframe width="560" height="315" src="//www.youtube.com/embed/Tq6rCWPdXoQ" frameborder="0" allowfullscreen></iframe> (:htmlend:)

Import Data and Analyze with Pandas

(:html:) <iframe width="560" height="315" src="https://www.youtube.com/embed/FXhED53VZ50" frameborder="0" allowfullscreen></iframe> (:htmlend:)

(:toggle hide pandas button show="Show Python (Pandas) Source":) (:div id=pandas:) (:source lang=python:)

  1. import Numpy, Pandas, and Matplotlib

import numpy as np import pandas as pd import matplotlib.pyplot as plt

  1. load the data file

data_file = pd.read_csv('data_with_headers.csv')

  1. create time vector from imported data

time = data_file['time']

  1. parse good sensor data from imported data

sensors = data_file.ix[:,'s1':'s4']

  1. display the first 6 sensor rows

print(sensors[0:6])

  1. adjust time to start at zero by subtracting the
  2. first element in the time vector (index = 0)

time = time - time[0]

  1. calculate the average of the sensor readings

avg = np.mean(sensors,1) # over the 2nd dimension

  1. export data

my_data = [time, sensors, avg] result = pd.concat(my_data,axis=1)

result.to_csv('result.csv')

  1. result.to_excel('result.xlsx')

result.to_html('result.htm') result.to_clipboard()

  1. generate a figure

plt.figure(1) plt.plot(time,sensors['s1'],'r-') plt.plot(time,avg,'b.')

  1. add text labels to the plot

plt.legend(['Sensor 2','Average']) plt.xlabel('Time (sec)') plt.ylabel('Sensor Values')

  1. save the figure as a PNG file

plt.savefig('my_Python_plot.png')

  1. show the figure on the screen

plt.show() (:sourceend:) (:divend:)

March 01, 2017, at 05:39 AM by 45.56.3.173 -
Changed line 72 from:

Import Data from an Internet Source

to:

Import Online Data and Analyze

March 01, 2017, at 05:38 AM by 45.56.3.173 -
Added lines 75-78:

(:html:) <iframe width="560" height="315" src="https://www.youtube.com/embed/KzOEmMiPSjg" frameborder="0" allowfullscreen></iframe> (:htmlend:)

March 01, 2017, at 05:10 AM by 45.56.3.173 -
Deleted lines 24-25:

Source Code

Changed line 74 from:
to:

Below is an example of pulling data from an Internet source, such as financial information about a stock. The example shows how to request, parse, and display the financial data.

March 01, 2017, at 05:08 AM by 45.56.3.173 -
Added lines 71-110:

plt.show() (:sourceend:)

Import Data from an Internet Source

(:source lang=python:) import pandas as pd from matplotlib.finance import * import matplotlib.pyplot as plt try:

    import wget

except:

    # install wget if needed
    import pip
    pip.main(['install','wget'])
    import wget
  1. stock ticker symbol

stock = 'GOOGL' url = 'https://chart.finance.yahoo.com/table.csv?s='+stock filename = wget.download(url)

  1. rename file

from shutil import move move(filename,stock.lower()+'.csv')

  1. import data with pandas

data = pd.read_csv(stock+'.csv') print(data['Close'][0:5]) print('min: '+str(min(data['Close'][0:30]))) print('max: '+str(max(data['Close'][0:30])))

  1. plot data with pyplot

plt.figure() plt.plot(data['Open'][0:30]) plt.plot(data['Close'][0:30]) plt.xlabel('days ago') plt.ylabel('price')

March 01, 2017, at 05:05 AM by 45.56.3.173 -
Changed lines 27-71 from:
 # import Numpy, Pandas, and Matplotlib
 import numpy as np
 import pandas as pd
 import matplotlib.pyplot as plt

 # load the data file
 data_file = pd.read_csv('data_with_headers.csv')

 # create time vector from imported data
 time = data_file['time']
 # parse good sensor data from imported data
 sensors = data_file.ix[:,'s1':'s4']

 # display the first 6 sensor rows
 print(sensors[0:6])

 # adjust time to start at zero by subtracting the
 #  first element in the time vector (index = 0)
 time = time - time[0]

 # calculate the average of the sensor readings
 avg = np.mean(sensors,1) # over the 2nd dimension

 # export data
 my_data = [time, sensors, avg]
 result = pd.concat(my_data,axis=1)

 result.to_csv('result.csv')
 #result.to_excel('result.xlsx')
 result.to_html('result.htm')
 result.to_clipboard()

 # generate a figure
 plt.figure(1)
 plt.plot(time,sensors['s1'],'r-')
 plt.plot(time,avg,'b.')
 # add text labels to the plot
 plt.legend(['Sensor 2','Average'])
 plt.xlabel('Time (sec)')
 plt.ylabel('Sensor Values')
 # save the figure as a PNG file
 plt.savefig('my_Python_plot.png')
 # show the figure on the screen
 plt.show()
to:

(:source lang=python:)

  1. import Numpy, Pandas, and Matplotlib

import numpy as np import pandas as pd import matplotlib.pyplot as plt

  1. load the data file

data_file = pd.read_csv('data_with_headers.csv')

  1. create time vector from imported data

time = data_file['time']

  1. parse good sensor data from imported data

sensors = data_file.ix[:,'s1':'s4']

  1. display the first 6 sensor rows

print(sensors[0:6])

  1. adjust time to start at zero by subtracting the
  2. first element in the time vector (index = 0)

time = time - time[0]

  1. calculate the average of the sensor readings

avg = np.mean(sensors,1) # over the 2nd dimension

  1. export data

my_data = [time, sensors, avg] result = pd.concat(my_data,axis=1)

result.to_csv('result.csv')

  1. result.to_excel('result.xlsx')

result.to_html('result.htm') result.to_clipboard()

  1. generate a figure

plt.figure(1) plt.plot(time,sensors['s1'],'r-') plt.plot(time,avg,'b.')

  1. add text labels to the plot

plt.legend(['Sensor 2','Average']) plt.xlabel('Time (sec)') plt.ylabel('Sensor Values')

  1. save the figure as a PNG file

plt.savefig('my_Python_plot.png')

  1. show the figure on the screen

plt.show() (:sourceend:)

Changed line 41 from:
 print sensors[0:6]
to:
 print(sensors[0:6])
Changed line 14 from:

<iframe width="560" height="315" src="https://www.youtube.com/embed/FXhED53VZ50" frameborder="0" allowfullscreen></iframe>

to:

<iframe width="560" height="315" src="//www.youtube.com/embed/Tq6rCWPdXoQ" frameborder="0" allowfullscreen></iframe>

Changed line 22 from:

<iframe width="560" height="315" src="//www.youtube.com/embed/Tq6rCWPdXoQ" frameborder="0" allowfullscreen></iframe>

to:

<iframe width="560" height="315" src="https://www.youtube.com/embed/FXhED53VZ50" frameborder="0" allowfullscreen></iframe>

Changed line 14 from:

<iframe width="560" height="315" src="//www.youtube.com/embed/Tq6rCWPdXoQ" frameborder="0" allowfullscreen></iframe>

to:

<iframe width="560" height="315" src="https://www.youtube.com/embed/FXhED53VZ50" frameborder="0" allowfullscreen></iframe>

Changed line 7 from:

A common task for scientists and engineers is to analyze data from an external source that may be in a text or comma separated value (CSV) format. By importing the data into Python, data analysis such as statistics, trending, or calculations can be made to synthesize the information into relevant and actionable information. This tutorial demonstrates how to import data, perform a basic analysis, trend the results, and export the results to another text file. A script file of the Python source code with sample data is below.

to:

A common task for scientists and engineers is to analyze data from an external source that may be in a text or comma separated value (CSV) format. By importing the data into Python, data analysis such as statistics, trending, or calculations can be made to synthesize the information into relevant and actionable information. This tutorial demonstrates how to import data, perform a basic analysis, trend the results, and export the results to another text file. Two examples are provided with Numpy and Pandas. Script files of the Python source code with sample data are below.

Added lines 9-10:

Import Data and Analyze with Numpy

Added lines 16-71:

Import Data and Analyze with Pandas

(:html:) <iframe width="560" height="315" src="//www.youtube.com/embed/Tq6rCWPdXoQ" frameborder="0" allowfullscreen></iframe> (:htmlend:)

Source Code

 # import Numpy, Pandas, and Matplotlib
 import numpy as np
 import pandas as pd
 import matplotlib.pyplot as plt

 # load the data file
 data_file = pd.read_csv('data_with_headers.csv')

 # create time vector from imported data
 time = data_file['time']
 # parse good sensor data from imported data
 sensors = data_file.ix[:,'s1':'s4']

 # display the first 6 sensor rows
 print sensors[0:6]

 # adjust time to start at zero by subtracting the
 #  first element in the time vector (index = 0)
 time = time - time[0]

 # calculate the average of the sensor readings
 avg = np.mean(sensors,1) # over the 2nd dimension

 # export data
 my_data = [time, sensors, avg]
 result = pd.concat(my_data,axis=1)

 result.to_csv('result.csv')
 #result.to_excel('result.xlsx')
 result.to_html('result.htm')
 result.to_clipboard()

 # generate a figure
 plt.figure(1)
 plt.plot(time,sensors['s1'],'r-')
 plt.plot(time,avg,'b.')
 # add text labels to the plot
 plt.legend(['Sensor 2','Average'])
 plt.xlabel('Time (sec)')
 plt.ylabel('Sensor Values')
 # save the figure as a PNG file
 plt.savefig('my_Python_plot.png')
 # show the figure on the screen
 plt.show()
April 28, 2014, at 01:04 PM by 197.149.171.26 -
Changed line 12 from:

<iframe width="560" height="315" src="//www.youtube.com/embed/E56egH10RJA?list=PLLBUgWXdTBDi-E--rwBujaNkTejLNI6ap" frameborder="0" allowfullscreen></iframe>

to:

<iframe width="560" height="315" src="//www.youtube.com/embed/Tq6rCWPdXoQ" frameborder="0" allowfullscreen></iframe>

April 28, 2014, at 11:59 AM by 4.28.34.254 -
Added lines 7-10:

A common task for scientists and engineers is to analyze data from an external source that may be in a text or comma separated value (CSV) format. By importing the data into Python, data analysis such as statistics, trending, or calculations can be made to synthesize the information into relevant and actionable information. This tutorial demonstrates how to import data, perform a basic analysis, trend the results, and export the results to another text file. A script file of the Python source code with sample data is below.

Added lines 14-15:

This tutorial can also be completed with Excel and Matlab. Click on the appropriate link for additional information.

April 28, 2014, at 09:33 AM by 197.149.171.26 -
Added lines 1-28:

(:title Data Analysis with Python:) (:keywords big data, data analysis, Python, numpy, spreadsheet, nonlinear, optimization, engineering optimization, university course:) (:description Data Analysis with Python - Problem-Solving Techniques for Chemical Engineers at Brigham Young University:)

Python Data Analysis

(:html:) <iframe width="560" height="315" src="//www.youtube.com/embed/E56egH10RJA?list=PLLBUgWXdTBDi-E--rwBujaNkTejLNI6ap" frameborder="0" allowfullscreen></iframe> (:htmlend:)


(:html:)

 <div id="disqus_thread"></div>
    <script type="text/javascript">
        /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */
        var disqus_shortname = 'apmonitor'; // required: replace example with your forum shortname

        /* * * DON'T EDIT BELOW THIS LINE * * */
        (function() {
            var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
            dsq.src = 'https://' + disqus_shortname + '.disqus.com/embed.js';
            (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
        })();
    </script>
    <noscript>Please enable JavaScript to view the <a href="https://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript>
    <a href="https://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a>

(:htmlend:)