MAS116/MAS117 PRESENTATION LAB 11
On this sheet you will combine LaTeX and HTML with Python. First you will use SciPy to integrate numerically, then learn how to export data from Python by writing to files which can then be read by other software.
Recall from MAS109 Introduction to Probability and Data Science that the probability density function (pdf) of the standard normal distribution is given by
so that for a normally distributed random variable . The cumulative distribution function (cdf) is given by
This integral has no simple expression so needs to be calculated numerically.
You saw previously how to calculate approximations to definite
integrals. However, in this case we want to define an integral from
so
the previous method wouldn’t work. We will use the integration functions from
the SciPy (short for Scientific Python) module. To integrate a function f
from a
to b
you use scipy.integrate.quad(f, a, b)
as in the following.
1import numpy as np 2import scipy.integrate as integrate 3 4 5def pdf(z): 6 """Calculate the probability density function.""" 7 return np.exp(-z**2 / 2) / np.sqrt(2 * np.pi) 8 9integral = integrate.quad(pdf, 0, 2) 10print("The integral of phi from 0 to 2 is:", integral)
The output is a pair of numbers in round brackets: (result, error)
. The
calculated result
is within error
of the actual integral. Note that this
expression is not a list as it is in round brackets rather than square brackets. This
expression is a type called a tuple; it is a bit like a list, but you can’t change a
tuple. The important point here is that you access the individual entries in
a tuple in the same way that you do with a list, so to obtain just the
approximation to the integral and not the error, you should change the final line
to the following.
10print("Prob(0 <= Z <= 2) =", integral[0], 11 "plus or minus", integral[1])
This integration method will allow us to use
and
as
limits. We do this with np.inf
and -np.inf
. For instance, add the following
lines.
13integral = integrate.quad(pdf, -np.inf, 0) 14print("Prob(Z <= 0) =", integral[0], 15 "plus or minus", integral[1])
We will ignore the error term in this sheet, as you can often do unless you are
interested in precise numerics. We can now get Python to plot the cdf
within a range and to tabulate its values. Download the following program
cdf.py
from the module website (right-click, save-as). Run the program.
1import numpy as np 2import scipy.integrate as integrate 3import matplotlib.pyplot as plt 4 5 6def pdf(x): 7 """Calculate the probability density function.""" 8 return np.exp(-x**2 / 2) / np.sqrt(2 * np.pi) 9 10 11MAX = 3.5 12 13z = np.linspace(-MAX, MAX, int(2*MAX*16 + 1)) 14 15cdf = np.zeros_like(z) 16for i in range(len(z)): 17 cdf[i] = integrate.quad(pdf, -np.inf, z[i])[0] 18 19plt.plot(z, pdf(z)) 20plt.plot(z, cdf) 21plt.axis([-MAX, MAX, 0, 1.05]) 22plt.show() 23 24print("\n{0:^5} {1:^6}".format("z", "cdf(z)")) 25print("-"*13) 26for i in range(0, len(z), 8): 27 print("{0:5.2f} {1:.4f}".format(z[i], cdf[i]))
Let’s have a look at some of the lines.
MAX
. We will be calculating the cdf from -MAX
to MAX
.
z
but
with all the entries zero.
Now in the rest of this lab we will look at ways of exporting the data we have just created, so that it can be used in other software.
A graph in a LaTeX document looks a lot better if it is drawn by pgfplots than if it is just an imported picture; for one thing, the fonts will actually match. We will get Python to export the function values to a file and then get pgfplots to read in the values from the file to draw the graph.
Add the following line to the above program and run it.
29with open("cdf.data", mode=’w’) as cdf_file: 30 for i in range(0, len(z)): 31 cdf_file.write("{0:.2f} {1:.4f}\n" 32 .format(z[i], cdf[i]))
The file cdf.data
should have been created in your working
directory. Open the file in a text editor. You should see a table of
-values
and the corresponding cummulative density function values.
Let’s analyse what was done here. It is very similar to the table printing at
lines 26 and 27 of the program above. The main difference is that it is part of a
with
program block and we use the file_object.write()
method rather than
the print()
function. To get python to write a file, the general form is as follows.
with open(file_name, mode=’w’) as file_object: indented program block containing statements of the form file_object.write(string)
The first line creates a Python file object which refers to the file with the name
file_name
. The mode=’w’
part means that we will be ‘w’riting to this file. When
this command is issued, if the file already exists then its contents are deleted. If
you wanted to add some data to an already existing file then you use mode=’a’
,
which stands for ‘append’.
The command file_object.write(string)
will just output the string
string
to the file with name file_name
instead of printing it to the
screen.
The file cdf_graph_demo.tex
is available on the module webpage (right click to
save). Save it in the same directory that your file cdf.data
is in, and LaTeX
it.
You should see the graphs of the pdf and the cdf. Much of the code in
cdf_graph_demo.tex
is setting up the plot. The relevant lines here are lines
27–31 We have an explicit formula for the pdf, so that is plotted by pgfplots on
lines 27–30. Line 31 is the following.
31\addplot [mark=none, thick] file {cdf.data};
This plots the cdf from the data that is in the file cdf.data
; the keyword file
tells pgfplots to look in the given file.
The new piece of Python you will need here is the idea of raw triple quotes to define a string. Note that on the first line of the following the ‘r’ stands for ‘raw’ and that you need three quotation marks at the beginning and at the end. Try the following.
string = r"""This string appears as typed. Things like \t and \n are not converted. It is like using \begin{verbatim} in LaTeX. """ print(string)
To get the middle pair of lines using the ordinary construction for a string you would have to use something like the following.
string_2 = "Things\nlike \\t and \\n are not converted." print(string_2)
So it is sometimes useful to use this construction if your string will have lots of backslashes and newlines, like in a tex file, as you will see below.
Now add the following code to your Python program cdf.py
.
33HEADER = r"""\begin{center} 34 \begin{tabular}{rr} 35 \toprule 36 $z$ & $\Phi(z)$ \\ 37 \otoprule 38""" 39 40FOOTER = r""" \bottomrule 41 \end{tabular} 42\end{center}""" 43 44with open("cdf_table.tex", mode="w") as tex_file: 45 tex_file.write(HEADER) 46 for i in range(0, len(z), 8): 47 tex_file.write(" ${0:.2f}$ & ${1:.6f}$ " 48 "\\\\ \n" 49 .format(z[i], cdf[i])) 50 tex_file.write(FOOTER)
Run the program. It should have created a file cdf_table.tex
in your working
directory. Open it and examine it. This is a ready-built piece of LaTeX that you
can input into any LaTeX file.
Add the following lines to the end of the LaTeX file cdf_graph_demo.tex
.
Here is the table. \input{cdf_table.tex}
Now LaTeX the file cdf_graph_demo.tex
and see the table appear.
Exercise 1. Copy your Python program from above and save it with a
different name, such as cdf_html.py
. Now edit the program so that it saves
the graph as an SVG image file cdf_graph.svg
and writes an HTML file
cdf.html
which displays the table of data of cdf values and displays the
SVG graph. You can base it on the HTML file cdf_incomplete.html
which
you can find on the module webpage (right click to save).