Skip to content

Commit 4592fa8

Browse files
Tom's Feb 17 edit of a lecture
1 parent 8e0059b commit 4592fa8

File tree

1 file changed

+129
-7
lines changed

1 file changed

+129
-7
lines changed

lectures/multivariate_normal.md

Lines changed: 129 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -270,17 +270,17 @@ We start with a bivariate normal distribution pinned down by
270270

271271
$$
272272
\mu=\left[\begin{array}{c}
273-
0\\
274-
0
273+
.5 \\
274+
1.0
275275
\end{array}\right],\quad\Sigma=\left[\begin{array}{cc}
276276
1 & .5\\
277-
.5 & 2
277+
.5 & 1
278278
\end{array}\right]
279279
$$
280280

281281
```{code-cell} python3
282-
μ = np.array([0., 0.])
283-
Σ = np.array([[1., .5], [.5 ,2.]])
282+
μ = np.array([.5, 1.])
283+
Σ = np.array([[1., .5], [.5 ,1.]])
284284
285285
# construction of the multivariate normal instance
286286
multi_normal = MultivariateNormal(μ, Σ)
@@ -291,17 +291,139 @@ k = 1 # choose partition
291291
292292
# partition and compute regression coefficients
293293
multi_normal.partition(k)
294-
multi_normal.βs[0]
294+
multi_normal.βs[0],multi_normal.βs[1]
295295
```
296296

297+
Let's illustrate the fact that you _can regress anything on anything else_.
297298

298-
To illustrate the idea that you _can regress anything on anything else_, let's first compute the mean and variance of the distribution of $z_2$
299+
We have computed everything we need to compute two regression lines, one of $z_2$ on $z_1$, the other of $z_1$ on $z_2$.
300+
301+
We'll represent these regressions as
302+
303+
$$
304+
z_1 = a_1 + b_1 z_2 + \epsilon_1
305+
$$
306+
307+
and
308+
309+
$$
310+
z_2 = a_2 + b_2 z_1 + \epsilon_2
311+
$$
312+
313+
where we have the population least squares orthogonality conditions
314+
315+
$$
316+
E \epsilon_1 z_2 = 0
317+
$$
318+
319+
and
320+
321+
$$
322+
E \epsilon_2 z_1 = 0
323+
$$
324+
325+
Let's compute $a_1, a_2, b_1, b_2$.
326+
327+
```{code-cell} python3
328+
329+
beta = multi_normal.βs
330+
331+
a1 = μ[0] - beta[0]*μ[1]
332+
b1 = beta[0]
333+
334+
a2 = μ[1] - beta[1]*μ[0]
335+
b2 = beta[1]
336+
```
337+
338+
Let's print out the intercepts and slopes.
339+
340+
341+
For the regression of $z_1$ on $z_2$ we have
342+
343+
```{code-cell} python3
344+
print ("a1 = ", a1)
345+
print ("b1 = ", b1)
346+
```
347+
348+
For the regression of $z_2$ on $z_1$ we have
349+
350+
```{code-cell} python3
351+
print ("a2 = ", a2)
352+
print ("b2 = ", b2)
353+
```
354+
355+
356+
357+
Now let's plot the two regression lines and stare at them.
358+
359+
360+
```{code-cell} python3
361+
362+
z2 = np.linspace(-4,4,100)
363+
364+
365+
a1 = np.squeeze(a1)
366+
b1 = np.squeeze(b1)
367+
368+
a2 = np.squeeze(a2)
369+
b2 = np.squeeze(b2)
370+
371+
z1 = b1*z2 + a1
372+
373+
374+
z1h = z2/b2 - a2/b2
375+
376+
377+
fig = plt.figure(figsize=(12,12))
378+
ax = fig.add_subplot(1, 1, 1)
379+
ax.set(xlim=(-4, 4), ylim=(-4, 4))
380+
ax.spines['left'].set_position('center')
381+
ax.spines['bottom'].set_position('zero')
382+
ax.spines['right'].set_color('none')
383+
ax.spines['top'].set_color('none')
384+
ax.xaxis.set_ticks_position('bottom')
385+
ax.yaxis.set_ticks_position('left')
386+
plt.ylabel('$z_1$', loc = 'top')
387+
plt.xlabel('$z_2$,', loc = 'right')
388+
plt.title('two regressions')
389+
plt.plot(z2,z1, 'r', label = "$z_1$ on $z_2$")
390+
plt.plot(z2,z1h, 'b', label = "$z_2$ on $z_1$")
391+
plt.legend()
392+
plt.show()
393+
```
394+
395+
The red line is the expectation of $z_1$ conditional on $z_2$.
396+
397+
The intercept and slope of the red line are
398+
399+
```{code-cell} python3
400+
print("a1 = ", a1)
401+
print("b1 = ", b1)
402+
```
403+
404+
The blue line is the expectation of $z_2$ conditional on $z_1$.
405+
406+
The intercept and slope of the blue line are
407+
408+
```{code-cell} python3
409+
print("-a2/b2 = ", - a2/b2)
410+
print("1/b2 = ", 1/b2)
411+
```
412+
413+
We can use these regression lines or our code to compute conditional expectations.
414+
415+
Let's compute the mean and variance of the distribution of $z_2$
299416
conditional on $z_1=5$.
300417

301418
After that we'll reverse what are on the left and right sides of the regression.
302419

303420

304421

422+
423+
424+
425+
426+
305427
```{code-cell} python3
306428
# compute the cond. dist. of z1
307429
ind = 1

0 commit comments

Comments
 (0)