Skip to content

Commit 54580fb

Browse files
first commit to chapter 7; just skeleton
1 parent 87619c2 commit 54580fb

File tree

2 files changed

+203
-0
lines changed

2 files changed

+203
-0
lines changed
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
{
2+
"metadata": {
3+
"name": "DontOverfit"
4+
},
5+
"nbformat": 3,
6+
"nbformat_minor": 0,
7+
"worksheets": [
8+
{
9+
"cells": [
10+
{
11+
"cell_type": "markdown",
12+
"metadata": {},
13+
"source": [
14+
"## Implementation of Salisman's Don't Overfit submission"
15+
]
16+
},
17+
{
18+
"cell_type": "markdown",
19+
"metadata": {},
20+
"source": [
21+
"From [Kaggle](http://www.kaggle.com/c/overfitting)\n",
22+
">In order to achieve this we have created a simulated data set with 200 variables and 20,000 cases. An \u2018equation\u2019 based on this data was created in order to generate a Target to be predicted. Given the all 20,000 cases, the problem is very easy to solve \u2013 but you only get given the Target value of 250 cases \u2013 the task is to build a model that gives the best predictions on the remaining 19,750 cases."
23+
]
24+
},
25+
{
26+
"cell_type": "code",
27+
"collapsed": false,
28+
"input": [
29+
"import gzip\n",
30+
"import requests\n",
31+
"url = \"\"\n",
32+
"data = requests.get(url)\n",
33+
"f = gzip.open('file.txt.gz', 'rb')\n",
34+
"file_content = f.read()\n"
35+
],
36+
"language": "python",
37+
"metadata": {},
38+
"outputs": []
39+
}
40+
],
41+
"metadata": {}
42+
}
43+
]
44+
}
Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
{
2+
"metadata": {
3+
"name": "MachineLearning"
4+
},
5+
"nbformat": 3,
6+
"nbformat_minor": 0,
7+
"worksheets": [
8+
{
9+
"cells": [
10+
{
11+
"cell_type": "markdown",
12+
"metadata": {},
13+
"source": [
14+
"List of topics to cover:\n",
15+
"\n",
16+
"- Bayesian solution to overfitting\n",
17+
" - Salisman's solution to the Don't Overfit\n",
18+
"- Predictive distributions; \"how do I evaluate testing data?\"\n",
19+
"- model fitting, BIC + visualization tools\n",
20+
"- Gaussian Processes\n",
21+
"\n",
22+
"\n",
23+
"Would be nice/cool to cover:\n",
24+
"\n",
25+
"- classification models (using the books text)\n",
26+
"- Bayesian networks?"
27+
]
28+
},
29+
{
30+
"cell_type": "code",
31+
"collapsed": false,
32+
"input": [],
33+
"language": "python",
34+
"metadata": {},
35+
"outputs": []
36+
},
37+
{
38+
"cell_type": "code",
39+
"collapsed": false,
40+
"input": [],
41+
"language": "python",
42+
"metadata": {},
43+
"outputs": []
44+
},
45+
{
46+
"cell_type": "code",
47+
"collapsed": false,
48+
"input": [],
49+
"language": "python",
50+
"metadata": {},
51+
"outputs": []
52+
},
53+
{
54+
"cell_type": "code",
55+
"collapsed": false,
56+
"input": [],
57+
"language": "python",
58+
"metadata": {},
59+
"outputs": []
60+
},
61+
{
62+
"cell_type": "code",
63+
"collapsed": false,
64+
"input": [
65+
"from IPython.core.display import HTML\n",
66+
"def css_styling():\n",
67+
" styles = open(\"../styles/custom.css\", \"r\").read()\n",
68+
" return HTML(styles)\n",
69+
"css_styling()"
70+
],
71+
"language": "python",
72+
"metadata": {},
73+
"outputs": [
74+
{
75+
"html": [
76+
"<style>\n",
77+
" @font-face {\n",
78+
" font-family: \"Computer Modern\";\n",
79+
" src: url('http://mirrors.ctan.org/fonts/cm-unicode/fonts/otf/cmunss.otf');\n",
80+
" }\n",
81+
" div.cell{\n",
82+
" width:800px;\n",
83+
" margin-left:auto;\n",
84+
" margin-right:auto;\n",
85+
" }\n",
86+
" h1 {\n",
87+
" font-family: Helvetica, serif;\n",
88+
" }\n",
89+
" h4{\n",
90+
" margin-top:12px;\n",
91+
" margin-bottom: 3px;\n",
92+
" }\n",
93+
" div.text_cell_render{\n",
94+
" font-family: Computer Modern, \"Helvetica Neue\", Arial, Helvetica, Geneva, sans-serif;\n",
95+
" line-height: 145%;\n",
96+
" font-size: 130%;\n",
97+
" width:800px;\n",
98+
" margin-left:auto;\n",
99+
" margin-right:auto;\n",
100+
" }\n",
101+
" .CodeMirror{\n",
102+
" font-family: \"Source Code Pro\", source-code-pro,Consolas, monospace;\n",
103+
" }\n",
104+
" .prompt{\n",
105+
" display: None;\n",
106+
" }\n",
107+
" .text_cell_render h5 {\n",
108+
" font-weight: 300;\n",
109+
" font-size: 16pt;\n",
110+
" color: #4057A1;\n",
111+
" font-style: italic;\n",
112+
" margin-bottom: .5em;\n",
113+
" margin-top: 0.5em;\n",
114+
" display: block;\n",
115+
" }\n",
116+
" \n",
117+
" .warning{\n",
118+
" color: rgb( 240, 20, 20 )\n",
119+
" }\n",
120+
" \n",
121+
"</style>\n",
122+
"<script>\n",
123+
" MathJax.Hub.Config({\n",
124+
" TeX: {\n",
125+
" extensions: [\"AMSmath.js\"]\n",
126+
" },\n",
127+
" tex2jax: {\n",
128+
" inlineMath: [ ['$','$'], [\"\\\\(\",\"\\\\)\"] ],\n",
129+
" displayMath: [ ['$$','$$'], [\"\\\\[\",\"\\\\]\"] ]\n",
130+
" },\n",
131+
" displayAlign: 'center', // Change this to 'center' to center equations.\n",
132+
" \"HTML-CSS\": {\n",
133+
" styles: {'.MathJax_Display': {\"margin\": 4}}\n",
134+
" }\n",
135+
" });\n",
136+
"</script>"
137+
],
138+
"output_type": "pyout",
139+
"prompt_number": 1,
140+
"text": [
141+
"<IPython.core.display.HTML at 0x5beaeb8>"
142+
]
143+
}
144+
],
145+
"prompt_number": 1
146+
},
147+
{
148+
"cell_type": "code",
149+
"collapsed": false,
150+
"input": [],
151+
"language": "python",
152+
"metadata": {},
153+
"outputs": []
154+
}
155+
],
156+
"metadata": {}
157+
}
158+
]
159+
}

0 commit comments

Comments
 (0)