@@ -282,15 +282,55 @@ to compute the number of qualified observations in each sample; finally compute
282
282
Both the first and last few entries of the resulting data frame are printed
283
283
below to show that we end up with 20,000 point estimates, one for each of the 20,000 samples.
284
284
285
+ ``` {code-cell} ipython3
286
+ (
287
+ samples
288
+ .groupby('replicate')
289
+ ['room_type']
290
+ .value_counts(normalize=True)
291
+ )
292
+ ```
293
+
294
+ The returned object is a series,
295
+ and as we have previously learned
296
+ we can use ` reset_index ` to change it to a data frame.
297
+ However,
298
+ there is one caveat here:
299
+ when we use the ` value_counts ` function
300
+ on a grouped series and try to ` reset_index `
301
+ we will end up with two columns with the same name
302
+ and therefor get an error
303
+ (in this case, ` room_type ` will occur twice).
304
+ Fortunately,
305
+ there is a simple solution:
306
+ when we call ` reset_index ` ,
307
+ we can specify the name of the new column
308
+ with the ` name ` parameter:
309
+
310
+ ``` {code-cell} ipython3
311
+ (
312
+ samples
313
+ .groupby('replicate')
314
+ ['room_type']
315
+ .value_counts(normalize=True)
316
+ .reset_index(name='sample_proportion')
317
+ )
318
+ ```
319
+
320
+ Below we put everything together
321
+ and also filter the data frame to keep only the room types
322
+ that we are interested in.
323
+
285
324
``` {code-cell} ipython3
286
325
sample_estimates = (
287
326
samples
288
327
.groupby('replicate')
289
328
['room_type']
290
329
.value_counts(normalize=True)
291
330
.reset_index(name='sample_proportion')
292
- .query('room_type=="Entire home/apt"')
293
331
)
332
+
333
+ sample_estimates = sample_estimates[sample_estimates['room_type'] == 'Entire home/apt']
294
334
sample_estimates
295
335
```
296
336
0 commit comments