MAS116/MAS117 PRESENTATION LAB 11

On this sheet you will combine LaTeX and HTML with Python. First you will use SciPy to integrate numerically, then learn how to export data from Python by writing to files which can then be read by other software.

1. More integration and SciPy

Recall from MAS109 Introduction to Probability and Data Science that the probability density function (pdf) of the standard normal distribution N(0,1) is given by

ϕ(z) := 1 2πexp ( z2 2 ),

so that for Z a normally distributed random variable P(a Z b) = abϕ(z)dz. The cumulative distribution function (cdf) is given by

Φ(z) := P(Z z) =z 1 2πexp ( t2 2 )dt.

This integral has no simple expression so needs to be calculated numerically. You saw previously how to calculate approximations to definite integrals. However, in this case we want to define an integral from so the previous method wouldn’t work. We will use the integration functions from the SciPy (short for Scientific Python) module. To integrate a function f from a to b you use scipy.integrate.quad(f, a, b) as in the following.

1import numpy as np 
2import scipy.integrate as integrate 
3 
4 
5def pdf(z): 
6   """Calculate the probability density function.""" 
7   return np.exp(-z**2 / 2) / np.sqrt(2 * np.pi) 
8 
9integral = integrate.quad(pdf, 0, 2) 
10print("The integral of phi from 0 to 2 is:", integral)

The output is a pair of numbers in round brackets: (result, error). The calculated result is within error of the actual integral. Note that this expression is not a list as it is in round brackets rather than square brackets. This expression is a type called a tuple; it is a bit like a list, but you can’t change a tuple. The important point here is that you access the individual entries in a tuple in the same way that you do with a list, so to obtain just the approximation to the integral and not the error, you should change the final line to the following.

10print("Prob(0 <= Z <= 2) =", integral[0], 
11     "plus or minus", integral[1])

This integration method will allow us to use and as limits. We do this with np.inf and -np.inf. For instance, add the following lines.

13integral = integrate.quad(pdf, -np.inf, 0) 
14print("Prob(Z <= 0) =", integral[0], 
15     "plus or minus", integral[1])

We will ignore the error term in this sheet, as you can often do unless you are interested in precise numerics. We can now get Python to plot the cdf within a range and to tabulate its values. Download the following program cdf.py from the module website (right-click, save-as). Run the program.

1import numpy as np 
2import scipy.integrate as integrate 
3import matplotlib.pyplot as plt 
4 
5 
6def pdf(x): 
7   """Calculate the probability density function.""" 
8   return np.exp(-x**2 / 2) / np.sqrt(2 * np.pi) 
9 
10 
11MAX = 3.5 
12 
13z = np.linspace(-MAX, MAX, int(2*MAX*16 + 1)) 
14 
15cdf = np.zeros_like(z) 
16for i in range(len(z)): 
17   cdf[i] = integrate.quad(pdf, -np.inf, z[i])[0] 
18 
19plt.plot(z, pdf(z)) 
20plt.plot(z, cdf) 
21plt.axis([-MAX, MAX, 0, 1.05]) 
22plt.show() 
23 
24print("\n{0:^5}  {1:^6}".format("z", "cdf(z)")) 
25print("-"*13) 
26for i in range(0, len(z), 8): 
27   print("{0:5.2f}  {1:.4f}".format(z[i], cdf[i]))

Let’s have a look at some of the lines.

Now in the rest of this lab we will look at ways of exporting the data we have just created, so that it can be used in other software.

2. Writing a file that pgfplots can read

A graph in a LaTeX document looks a lot better if it is drawn by pgfplots than if it is just an imported picture; for one thing, the fonts will actually match. We will get Python to export the function values to a file and then get pgfplots to read in the values from the file to draw the graph.

Add the following line to the above program and run it.

29with open("cdf.data", mode=’w’) as cdf_file: 
30   for i in range(0, len(z)): 
31       cdf_file.write("{0:.2f}  {1:.4f}\n" 
32                      .format(z[i], cdf[i]))

The file cdf.data should have been created in your working directory. Open the file in a text editor. You should see a table of z-values and the corresponding cummulative density function values.

Let’s analyse what was done here. It is very similar to the table printing at lines 26 and 27 of the program above. The main difference is that it is part of a with program block and we use the file_object.write() method rather than the print() function. To get python to write a file, the general form is as follows.

with open(file_name, mode=’w’) as file_object: 
   indented program block 
   containing statements of the form 
   file_object.write(string)

The first line creates a Python file object which refers to the file with the name file_name. The mode=’w’ part means that we will be ‘w’riting to this file. When this command is issued, if the file already exists then its contents are deleted. If you wanted to add some data to an already existing file then you use mode=’a’, which stands for ‘append’.

The command file_object.write(string) will just output the string string to the file with name file_name instead of printing it to the screen.

The file cdf_graph_demo.tex is available on the module webpage (right click to save). Save it in the same directory that your file cdf.data is in, and LaTeX it.

You should see the graphs of the pdf and the cdf. Much of the code in cdf_graph_demo.tex is setting up the plot. The relevant lines here are lines 27–31 We have an explicit formula for the pdf, so that is plotted by pgfplots on lines 27–30. Line 31 is the following.

31\addplot [mark=none, thick] file {cdf.data};

This plots the cdf from the data that is in the file cdf.data; the keyword file tells pgfplots to look in the given file.

3. Writing LaTeX and HTML with Python

The new piece of Python you will need here is the idea of raw triple quotes to define a string. Note that on the first line of the following the ‘r’ stands for ‘raw’ and that you need three quotation marks at the beginning and at the end. Try the following.

string = r"""This string appears as typed. 
Things 
like \t and \n are not converted. 
It is like using \begin{verbatim} in LaTeX. 
""" 
print(string)

To get the middle pair of lines using the ordinary construction for a string you would have to use something like the following.

string_2 = "Things\nlike \\t and \\n are not converted." 
print(string_2)

So it is sometimes useful to use this construction if your string will have lots of backslashes and newlines, like in a tex file, as you will see below.

Now add the following code to your Python program cdf.py.

33HEADER = r"""\begin{center} 
34  \begin{tabular}{rr} 
35   \toprule 
36   $z$ & $\Phi(z)$ \\ 
37   \otoprule 
38""" 
39 
40FOOTER = r"""    \bottomrule 
41  \end{tabular} 
42\end{center}""" 
43 
44with open("cdf_table.tex", mode="w") as tex_file: 
45   tex_file.write(HEADER) 
46   for i in range(0, len(z), 8): 
47       tex_file.write("    ${0:.2f}$ & ${1:.6f}$ " 
48                      "\\\\ \n" 
49                       .format(z[i], cdf[i])) 
50   tex_file.write(FOOTER)

Run the program. It should have created a file cdf_table.tex in your working directory. Open it and examine it. This is a ready-built piece of LaTeX that you can input into any LaTeX file.

Add the following lines to the end of the LaTeX file cdf_graph_demo.tex.

Here is the table. 
\input{cdf_table.tex}

Now LaTeX the file cdf_graph_demo.tex and see the table appear.

Exercise 1. Copy your Python program from above and save it with a different name, such as cdf_html.py. Now edit the program so that it saves the graph as an SVG image file cdf_graph.svg and writes an HTML file cdf.html which displays the table of data of cdf values and displays the SVG graph. You can base it on the HTML file cdf_incomplete.html which you can find on the module webpage (right click to save).