diff --git a/doc/README.rst b/doc/README.rst index 1a105a7a65a81..660a3b7232891 100644 --- a/doc/README.rst +++ b/doc/README.rst @@ -33,8 +33,8 @@ Some other important things to know about the docs: itself and the docs in this folder ``pandas/doc/``. The docstrings provide a clear explanation of the usage of the individual - functions, while the documentation in this filder consists of tutorial-like - overviews per topic together with some other information (whatsnew, + functions, while the documentation in this folder consists of tutorial-like + overviews per topic together with some other information (what's new, installation, etc). - The docstrings follow the **Numpy Docstring Standard** which is used widely @@ -56,7 +56,7 @@ Some other important things to know about the docs: x = 2 x**3 - will be renderd as + will be rendered as :: @@ -66,7 +66,7 @@ Some other important things to know about the docs: Out[2]: 8 This means that almost all code examples in the docs are always run (and the - ouptut saved) during the doc build. This way, they will always be up to date, + output saved) during the doc build. This way, they will always be up to date, but it makes the doc building a bit more complex. @@ -135,12 +135,12 @@ If you want to do a full clean build, do:: Staring with 0.13.1 you can tell ``make.py`` to compile only a single section of the docs, greatly reducing the turn-around time for checking your changes. -You will be prompted to delete unrequired `.rst` files, since the last commited -version can always be restored from git. +You will be prompted to delete `.rst` files that aren't required, since the +last committed version can always be restored from git. :: - #omit autosummary and api section + #omit autosummary and API section python make.py clean python make.py --no-api diff --git a/doc/source/10min.rst b/doc/source/10min.rst index a9a97ee56813c..2111bb2d72dcb 100644 --- a/doc/source/10min.rst +++ b/doc/source/10min.rst @@ -260,7 +260,7 @@ For slicing columns explicitly df.iloc[:,1:3] -For getting a value explicity +For getting a value explicitly .. ipython:: python diff --git a/doc/source/basics.rst b/doc/source/basics.rst index 4d67616c5cd60..a503367c13427 100644 --- a/doc/source/basics.rst +++ b/doc/source/basics.rst @@ -346,7 +346,7 @@ General DataFrame Combine The ``combine_first`` method above calls the more general DataFrame method ``combine``. This method takes another DataFrame and a combiner function, aligns the input DataFrame and then passes the combiner function pairs of -Series (ie, columns whose names are the same). +Series (i.e., columns whose names are the same). So, for instance, to reproduce ``combine_first`` as above: @@ -1461,7 +1461,7 @@ from the current type (say ``int`` to ``float``) df3.dtypes The ``values`` attribute on a DataFrame return the *lower-common-denominator* of the dtypes, meaning -the dtype that can accommodate **ALL** of the types in the resulting homogenous dtyped numpy array. This can +the dtype that can accommodate **ALL** of the types in the resulting homogeneous dtyped numpy array. This can force some *upcasting*. .. ipython:: python diff --git a/doc/source/cookbook.rst b/doc/source/cookbook.rst index 844112312cdce..fd68427a86951 100644 --- a/doc/source/cookbook.rst +++ b/doc/source/cookbook.rst @@ -499,7 +499,7 @@ The :ref:`HDFStores ` docs `Merging on-disk tables with millions of rows `__ -Deduplicating a large store by chunks, essentially a recursive reduction operation. Shows a function for taking in data from +De-duplicating a large store by chunks, essentially a recursive reduction operation. Shows a function for taking in data from csv file and creating a store by chunks, with date parsing as well. `See here `__ diff --git a/doc/source/dsintro.rst b/doc/source/dsintro.rst index 7c43a03e68013..928de285982cf 100644 --- a/doc/source/dsintro.rst +++ b/doc/source/dsintro.rst @@ -118,7 +118,7 @@ provided. The value will be repeated to match the length of **index** Series is ndarray-like ~~~~~~~~~~~~~~~~~~~~~~ -``Series`` acts very similary to a ``ndarray``, and is a valid argument to most NumPy functions. +``Series`` acts very similarly to a ``ndarray``, and is a valid argument to most NumPy functions. However, things like slicing also slice the index. .. ipython :: python @@ -474,7 +474,7 @@ DataFrame: For a more exhaustive treatment of more sophisticated label-based indexing and slicing, see the :ref:`section on indexing `. We will address the -fundamentals of reindexing / conforming to new sets of lables in the +fundamentals of reindexing / conforming to new sets of labels in the :ref:`section on reindexing `. Data alignment and arithmetic @@ -892,7 +892,7 @@ Slicing ~~~~~~~ Slicing works in a similar manner to a Panel. ``[]`` slices the first dimension. -``.ix`` allows you to slice abitrarily and get back lower dimensional objects +``.ix`` allows you to slice arbitrarily and get back lower dimensional objects .. ipython:: python diff --git a/doc/source/enhancingperf.rst b/doc/source/enhancingperf.rst index 00c76632ce17b..e6b735173110b 100644 --- a/doc/source/enhancingperf.rst +++ b/doc/source/enhancingperf.rst @@ -553,7 +553,7 @@ standard Python. :func:`pandas.eval` Parsers ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -There are two different parsers and and two different engines you can use as +There are two different parsers and two different engines you can use as the backend. The default ``'pandas'`` parser allows a more intuitive syntax for expressing diff --git a/doc/source/faq.rst b/doc/source/faq.rst index 81bebab46dac9..a613d53218ce2 100644 --- a/doc/source/faq.rst +++ b/doc/source/faq.rst @@ -144,7 +144,7 @@ Frequency conversion Frequency conversion is implemented using the ``resample`` method on TimeSeries and DataFrame objects (multiple time series). ``resample`` also works on panels -(3D). Here is some code that resamples daily data to montly: +(3D). Here is some code that resamples daily data to monthly: .. ipython:: python diff --git a/doc/source/gotchas.rst b/doc/source/gotchas.rst index 0078ffb506cc9..438e2f79c5ff3 100644 --- a/doc/source/gotchas.rst +++ b/doc/source/gotchas.rst @@ -183,7 +183,7 @@ Why not make NumPy like R? ~~~~~~~~~~~~~~~~~~~~~~~~~~ Many people have suggested that NumPy should simply emulate the ``NA`` support -present in the more domain-specific statistical programming langauge `R +present in the more domain-specific statistical programming language `R `__. Part of the reason is the NumPy type hierarchy: .. csv-table:: @@ -500,7 +500,7 @@ parse HTML tables in the top-level pandas io function ``read_html``. molasses. However consider the fact that many tables on the web are not big enough for the parsing algorithm runtime to matter. It is more likely that the bottleneck will be in the process of reading the raw - text from the url over the web, i.e., IO (input-output). For very large + text from the URL over the web, i.e., IO (input-output). For very large tables, this might not be true. **Issues with using** |Anaconda|_ diff --git a/doc/source/groupby.rst b/doc/source/groupby.rst index 22f1414c4f2b0..eaccbfddc1f86 100644 --- a/doc/source/groupby.rst +++ b/doc/source/groupby.rst @@ -969,7 +969,7 @@ Regroup columns of a DataFrame according to their sum, and sum the aggregated on df.groupby(df.sum(), axis=1).sum() -Returning a Series to propogate names +Returning a Series to propagate names ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Group DataFrame columns, compute a set of metrics and return a named Series. diff --git a/doc/source/indexing.rst b/doc/source/indexing.rst index 84736d4989f6f..9c73c679f726a 100644 --- a/doc/source/indexing.rst +++ b/doc/source/indexing.rst @@ -88,10 +88,10 @@ of multi-axis indexing. See more at :ref:`Selection by Position ` - ``.ix`` supports mixed integer and label based access. It is primarily label - based, but will fallback to integer positional access. ``.ix`` is the most + based, but will fall back to integer positional access. ``.ix`` is the most general and will support any of the inputs to ``.loc`` and ``.iloc``, as well as support for floating point label schemes. ``.ix`` is especially useful - when dealing with mixed positional and label based hierarchial indexes. + when dealing with mixed positional and label based hierarchical indexes. As using integer slices with ``.ix`` have different behavior depending on whether the slice is interpreted as position based or label based, it's usually better to be explicit and use ``.iloc`` or ``.loc``. @@ -230,7 +230,7 @@ new column. - The ``Series/Panel`` accesses are available starting in 0.13.0. If you are using the IPython environment, you may also use tab-completion to -see these accessable attributes. +see these accessible attributes. Slicing ranges -------------- @@ -328,7 +328,7 @@ For getting values with a boolean array df1.loc['a']>0 df1.loc[:,df1.loc['a']>0] -For getting a value explicity (equiv to deprecated ``df.get_value('a','A')``) +For getting a value explicitly (equiv to deprecated ``df.get_value('a','A')``) .. ipython:: python @@ -415,7 +415,7 @@ For getting a cross section using an integer position (equiv to ``df.xs(1)``) df1.iloc[1] -There is one signficant departure from standard python/numpy slicing semantics. +There is one significant departure from standard python/numpy slicing semantics. python/numpy allow slicing past the end of an array without an associated error. .. ipython:: python @@ -494,7 +494,7 @@ out what you're asking for. If you only want to access a scalar value, the fastest way is to use the ``at`` and ``iat`` methods, which are implemented on all of the data structures. -Similary to ``loc``, ``at`` provides **label** based scalar lookups, while, ``iat`` provides **integer** based lookups analagously to ``iloc`` +Similarly to ``loc``, ``at`` provides **label** based scalar lookups, while, ``iat`` provides **integer** based lookups analogously to ``iloc`` .. ipython:: python @@ -643,7 +643,7 @@ To return a Series of the same shape as the original s.where(s > 0) -Selecting values from a DataFrame with a boolean critierion now also preserves +Selecting values from a DataFrame with a boolean criterion now also preserves input data shape. ``where`` is used under the hood as the implementation. Equivalent is ``df.where(df < 0)`` @@ -690,7 +690,7 @@ without creating a copy: **alignment** Furthermore, ``where`` aligns the input boolean condition (ndarray or DataFrame), -such that partial selection with setting is possible. This is analagous to +such that partial selection with setting is possible. This is analogous to partial setting via ``.ix`` (but on the contents rather than the axis labels) .. ipython:: python @@ -756,7 +756,7 @@ between the values of columns ``a`` and ``c``. For example: # query df.query('(a < b) & (b < c)') -Do the same thing but fallback on a named index if there is no column +Do the same thing but fall back on a named index if there is no column with the name ``a``. .. ipython:: python @@ -899,7 +899,7 @@ The ``in`` and ``not in`` operators ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :meth:`~pandas.DataFrame.query` also supports special use of Python's ``in`` and -``not in`` comparison operators, providing a succint syntax for calling the +``not in`` comparison operators, providing a succinct syntax for calling the ``isin`` method of a ``Series`` or ``DataFrame``. .. ipython:: python @@ -1416,7 +1416,7 @@ faster, and allows one to index *both* axes if so desired. Why does the assignment when using chained indexing fail! ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -So, why does this show the ``SettingWithCopy`` warning / and possibly not work when you do chained indexing and assignement: +So, why does this show the ``SettingWithCopy`` warning / and possibly not work when you do chained indexing and assignment: .. code-block:: python @@ -2149,7 +2149,7 @@ metadata, like the index ``name`` (or, for ``MultiIndex``, ``levels`` and You can use the ``rename``, ``set_names``, ``set_levels``, and ``set_labels`` to set these attributes directly. They default to returning a copy; however, -you can specify ``inplace=True`` to have the data change inplace. +you can specify ``inplace=True`` to have the data change in place. .. ipython:: python diff --git a/doc/source/io.rst b/doc/source/io.rst index cfa97ca0f3fef..fa6ab646a47c8 100644 --- a/doc/source/io.rst +++ b/doc/source/io.rst @@ -29,7 +29,7 @@ IO Tools (Text, CSV, HDF5, ...) ******************************* -The pandas I/O api is a set of top level ``reader`` functions accessed like ``pd.read_csv()`` that generally return a ``pandas`` +The pandas I/O API is a set of top level ``reader`` functions accessed like ``pd.read_csv()`` that generally return a ``pandas`` object. * :ref:`read_csv` @@ -78,8 +78,8 @@ for some advanced strategies They can take a number of arguments: - - ``filepath_or_buffer``: Either a string path to a file, url - (including http, ftp, and s3 locations), or any object with a ``read`` + - ``filepath_or_buffer``: Either a string path to a file, URL + (including http, ftp, and S3 locations), or any object with a ``read`` method (such as an open file or ``StringIO``). - ``sep`` or ``delimiter``: A delimiter / separator to split fields on. `read_csv` is capable of inferring the delimiter automatically in some @@ -511,7 +511,7 @@ data columns: Date Parsing Functions ~~~~~~~~~~~~~~~~~~~~~~ Finally, the parser allows you can specify a custom ``date_parser`` function to -take full advantage of the flexiblity of the date parsing API: +take full advantage of the flexibility of the date parsing API: .. ipython:: python @@ -964,7 +964,7 @@ Reading columns with a ``MultiIndex`` By specifying list of row locations for the ``header`` argument, you can read in a ``MultiIndex`` for the columns. Specifying non-consecutive -rows will skip the interveaning rows. In order to have the pre-0.13 behavior +rows will skip the intervening rows. In order to have the pre-0.13 behavior of tupleizing columns, specify ``tupleize_cols=True``. .. ipython:: python @@ -1038,7 +1038,7 @@ rather than reading the entire file into memory, such as the following: table -By specifiying a ``chunksize`` to ``read_csv`` or ``read_table``, the return +By specifying a ``chunksize`` to ``read_csv`` or ``read_table``, the return value will be an iterable object of type ``TextFileReader``: .. ipython:: python @@ -1100,7 +1100,7 @@ function takes a number of arguments. Only the first is required. used. (A sequence should be given if the DataFrame uses MultiIndex). - ``mode`` : Python write mode, default 'w' - ``encoding``: a string representing the encoding to use if the contents are - non-ascii, for python versions prior to 3 + non-ASCII, for python versions prior to 3 - ``line_terminator``: Character sequence denoting line end (default '\\n') - ``quoting``: Set quoting rules as in csv module (default csv.QUOTE_MINIMAL) - ``quotechar``: Character used to quote fields (default '"') @@ -1184,7 +1184,7 @@ with optional parameters: - ``double_precision`` : The number of decimal places to use when encoding floating point values, default 10. - ``force_ascii`` : force encoded string to be ASCII, default True. - ``date_unit`` : The time unit to encode to, governs timestamp and ISO8601 precision. One of 's', 'ms', 'us' or 'ns' for seconds, milliseconds, microseconds and nanoseconds respectively. Default 'ms'. -- ``default_handler`` : The handler to call if an object cannot otherwise be converted to a suitable format for JSON. Takes a single argument, which is the object to convert, and returns a serialisable object. +- ``default_handler`` : The handler to call if an object cannot otherwise be converted to a suitable format for JSON. Takes a single argument, which is the object to convert, and returns a serializable object. Note ``NaN``'s, ``NaT``'s and ``None`` will be converted to ``null`` and ``datetime`` objects will be converted based on the ``date_format`` and ``date_unit`` parameters. @@ -1208,7 +1208,7 @@ file / string. Consider the following DataFrame and Series: sjo = Series(dict(x=15, y=16, z=17), name='D') sjo -**Column oriented** (the default for ``DataFrame``) serialises the data as +**Column oriented** (the default for ``DataFrame``) serializes the data as nested JSON objects with column labels acting as the primary index: .. ipython:: python @@ -1224,7 +1224,7 @@ but the index labels are now primary: dfjo.to_json(orient="index") sjo.to_json(orient="index") -**Record oriented** serialises the data to a JSON array of column -> value records, +**Record oriented** serializes the data to a JSON array of column -> value records, index labels are not included. This is useful for passing DataFrame data to plotting libraries, for example the JavaScript library d3.js: @@ -1233,7 +1233,7 @@ libraries, for example the JavaScript library d3.js: dfjo.to_json(orient="records") sjo.to_json(orient="records") -**Value oriented** is a bare-bones option which serialises to nested JSON arrays of +**Value oriented** is a bare-bones option which serializes to nested JSON arrays of values only, column and index labels are not included: .. ipython:: python @@ -1241,7 +1241,7 @@ values only, column and index labels are not included: dfjo.to_json(orient="values") # Not available for Series -**Split oriented** serialises to a JSON object containing separate entries for +**Split oriented** serializes to a JSON object containing separate entries for values, index and columns. Name is also included for ``Series``: .. ipython:: python @@ -1252,13 +1252,13 @@ values, index and columns. Name is also included for ``Series``: .. note:: Any orient option that encodes to a JSON object will not preserve the ordering of - index and column labels during round-trip serialisation. If you wish to preserve + index and column labels during round-trip serialization. If you wish to preserve label ordering use the `split` option as it uses ordered containers. Date Handling +++++++++++++ -Writing in iso date format +Writing in ISO date format .. ipython:: python @@ -1268,7 +1268,7 @@ Writing in iso date format json = dfd.to_json(date_format='iso') json -Writing in iso date format, with microseconds +Writing in ISO date format, with microseconds .. ipython:: python @@ -1297,17 +1297,17 @@ Writing to a file, with a date index and a date column Fallback Behavior +++++++++++++++++ -If the JSON serialiser cannot handle the container contents directly it will fallback in the following manner: +If the JSON serializer cannot handle the container contents directly it will fallback in the following manner: - if a ``toDict`` method is defined by the unrecognised object then that - will be called and its returned ``dict`` will be JSON serialised. + will be called and its returned ``dict`` will be JSON serialized. - if a ``default_handler`` has been passed to ``to_json`` that will be called to convert the object. - otherwise an attempt is made to convert the object to a ``dict`` by parsing its contents. However if the object is complex this will often fail with an ``OverflowError``. -Your best bet when encountering ``OverflowError`` during serialisation +Your best bet when encountering ``OverflowError`` during serialization is to specify a ``default_handler``. For example ``timedelta`` can cause problems: @@ -1346,10 +1346,10 @@ Reading JSON Reading a JSON string to pandas object can take a number of parameters. The parser will try to parse a ``DataFrame`` if ``typ`` is not supplied or -is ``None``. To explicity force ``Series`` parsing, pass ``typ=series`` +is ``None``. To explicitly force ``Series`` parsing, pass ``typ=series`` - ``filepath_or_buffer`` : a **VALID** JSON string or file handle / StringIO. The string could be - a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host + a URL. Valid URL schemes include http, ftp, S3, and file. For file URLs, a host is expected. For instance, a local file could be file ://localhost/path/to/table.json - ``typ`` : type of object to recover (series or frame), default 'frame' @@ -1377,8 +1377,8 @@ is ``None``. To explicity force ``Series`` parsing, pass ``typ=series`` - ``dtype`` : if True, infer dtypes, if a dict of column to dtype, then use those, if False, then don't infer dtypes at all, default is True, apply only to the data - ``convert_axes`` : boolean, try to convert the axes to the proper dtypes, default is True -- ``convert_dates`` : a list of columns to parse for dates; If True, then try to parse datelike columns, default is True -- ``keep_default_dates`` : boolean, default True. If parsing dates, then parse the default datelike columns +- ``convert_dates`` : a list of columns to parse for dates; If True, then try to parse date-like columns, default is True +- ``keep_default_dates`` : boolean, default True. If parsing dates, then parse the default date-like columns - ``numpy`` : direct decoding to numpy arrays. default is False; Supports numeric data only, although labels may be non-numeric. Also note that the JSON ordering **MUST** be the same for each term if ``numpy=True`` - ``precise_float`` : boolean, default ``False``. Set to enable usage of higher precision (strtod) function when decoding string to double values. Default (``False``) is to use fast but less precise builtin functionality @@ -1387,7 +1387,7 @@ is ``None``. To explicity force ``Series`` parsing, pass ``typ=series`` then pass one of 's', 'ms', 'us' or 'ns' to force timestamp precision to seconds, milliseconds, microseconds or nanoseconds respectively. -The parser will raise one of ``ValueError/TypeError/AssertionError`` if the JSON is not parsable. +The parser will raise one of ``ValueError/TypeError/AssertionError`` if the JSON is not parseable. If a non-default ``orient`` was used when encoding to JSON be sure to pass the same option here so that decoding produces sensible results, see `Orient Options`_ for an @@ -1438,7 +1438,7 @@ Specify dtypes for conversion: pd.read_json('test.json', dtype={'A' : 'float32', 'bools' : 'int8'}).dtypes -Preserve string indicies: +Preserve string indices: .. ipython:: python @@ -1480,7 +1480,7 @@ The Numpy Parameter This supports numeric data only. Index and columns labels may be non-numeric, e.g. strings, dates etc. If ``numpy=True`` is passed to ``read_json`` an attempt will be made to sniff -an appropriate dtype during deserialisation and to subsequently decode directly +an appropriate dtype during deserialization and to subsequently decode directly to numpy arrays, bypassing the need for intermediate Python objects. This can provide speedups if you are deserialising a large amount of numeric @@ -1502,7 +1502,7 @@ data: timeit read_json(jsonfloats, numpy=True) -The speedup is less noticable for smaller datasets: +The speedup is less noticeable for smaller datasets: .. ipython:: python @@ -1586,7 +1586,7 @@ Reading HTML Content .. versionadded:: 0.12.0 The top-level :func:`~pandas.io.html.read_html` function can accept an HTML -string/file/url and will parse HTML tables into list of pandas DataFrames. +string/file/URL and will parse HTML tables into list of pandas DataFrames. Let's look at a few examples. .. note:: @@ -2381,7 +2381,7 @@ hierarchical path-name like format (e.g. ``foo/bar/bah``), which will generate a hierarchy of sub-stores (or ``Groups`` in PyTables parlance). Keys can be specified with out the leading '/' and are ALWAYS absolute (e.g. 'foo' refers to '/foo'). Removal operations can remove -everying in the sub-store and BELOW, so be *careful*. +everything in the sub-store and BELOW, so be *careful*. .. ipython:: python @@ -2516,7 +2516,7 @@ The ``indexers`` are on the left-hand side of the sub-expression: - ``columns``, ``major_axis``, ``ts`` -The right-hand side of the sub-expression (after a comparsion operator) can be: +The right-hand side of the sub-expression (after a comparison operator) can be: - functions that will be evaluated, e.g. ``Timestamp('2012-02-01')`` - strings, e.g. ``"bar"`` @@ -2696,7 +2696,7 @@ be data_columns # columns are stored separately as ``PyTables`` columns store.root.df_dc.table -There is some performance degredation by making lots of columns into +There is some performance degradation by making lots of columns into `data columns`, so it is up to the user to designate these. In addition, you cannot change data columns (nor indexables) after the first append/put operation (Of course you can simply read in the data and @@ -2935,7 +2935,7 @@ after the fact. - ``ptrepack --chunkshape=auto --propindexes --complevel=9 --complib=blosc in.h5 out.h5`` Furthermore ``ptrepack in.h5 out.h5`` will *repack* the file to allow -you to reuse previously deleted space. Aalternatively, one can simply +you to reuse previously deleted space. Alternatively, one can simply remove the file and write again, or use the ``copy`` method. .. _io.hdf5-notes: @@ -2996,7 +2996,7 @@ Currently, ``unicode`` and ``datetime`` columns (represented with a dtype of ``object``), **WILL FAIL**. In addition, even though a column may look like a ``datetime64[ns]``, if it contains ``np.nan``, this **WILL FAIL**. You can try to convert datetimelike columns to proper -``datetime64[ns]`` columns, that possibily contain ``NaT`` to represent +``datetime64[ns]`` columns, that possibly contain ``NaT`` to represent invalid values. (Some of these issues have been addressed and these conversion may not be necessary in future versions of pandas) @@ -3025,7 +3025,7 @@ may introduce a string for a column **larger** than the column can hold, an Exce could have a silent truncation of these columns, leading to loss of information). In the future we may relax this and allow a user-specified truncation to occur. -Pass ``min_itemsize`` on the first table creation to a-priori specifiy the minimum length of a particular string column. +Pass ``min_itemsize`` on the first table creation to a-priori specify the minimum length of a particular string column. ``min_itemsize`` can be an integer, or a dict mapping a column name to an integer. You can pass ``values`` as a key to allow all *indexables* or *data_columns* to have this min_itemsize. @@ -3070,7 +3070,7 @@ External Compatibility ~~~~~~~~~~~~~~~~~~~~~~ ``HDFStore`` write ``table`` format objects in specific formats suitable for -producing loss-less roundtrips to pandas objects. For external +producing loss-less round trips to pandas objects. For external compatibility, ``HDFStore`` can read native ``PyTables`` format tables. It is possible to write an ``HDFStore`` object that can easily be imported into ``R`` using the ``rhdf5`` library. Create a table @@ -3136,7 +3136,7 @@ Performance generally longer as compared with regular stores. Query times can be quite fast, especially on an indexed axis. - You can pass ``chunksize=`` to ``append``, specifying the - write chunksize (default is 50000). This will signficantly lower + write chunksize (default is 50000). This will significantly lower your memory usage on writing. - You can pass ``expectedrows=`` to the first ``append``, to set the TOTAL number of expected rows that ``PyTables`` will @@ -3304,7 +3304,7 @@ And you can explicitly force columns to be parsed as dates: pd.read_sql_table('data', engine, parse_dates=['Date']) -If needed you can explicitly specifiy a format string, or a dict of arguments +If needed you can explicitly specify a format string, or a dict of arguments to pass to :func:`pandas.to_datetime`: .. code-block:: python @@ -3456,7 +3456,7 @@ response code of Google BigQuery can be successful (200) even if the append failed. For this reason, if there is a failure to append to the table, the complete error response from BigQuery is returned which can be quite long given it provides a status for each row. You may want -to start with smaller chuncks to test that the size and types of your +to start with smaller chunks to test that the size and types of your dataframe match your destination table to make debugging simpler. .. code-block:: python @@ -3470,7 +3470,7 @@ The BigQuery SQL query language has some oddities, see `here regex) All of the regular expression examples can also be passed with the ``to_replace`` argument as the ``regex`` argument. In this case the ``value`` -argument must be passed explicity by name or ``regex`` must be a nested +argument must be passed explicitly by name or ``regex`` must be a nested dictionary. The previous example, in this case, would then be .. ipython:: python @@ -566,7 +566,7 @@ want to use a regular expression. Numeric Replacement ~~~~~~~~~~~~~~~~~~~ -Similiar to ``DataFrame.fillna`` +Similar to ``DataFrame.fillna`` .. ipython:: python :suppress: diff --git a/doc/source/options.rst b/doc/source/options.rst index 961797acb00aa..1e8517014bfc5 100644 --- a/doc/source/options.rst +++ b/doc/source/options.rst @@ -166,7 +166,7 @@ dataframes to stretch across pages, wrapped over the full column vs row-wise. pd.reset_option('max_rows') ``display.max_columnwidth`` sets the maximum width of columns. Cells -of this length or longer will be truncated with an elipsis. +of this length or longer will be truncated with an ellipsis. .. ipython:: python diff --git a/doc/source/overview.rst b/doc/source/overview.rst index 8e47466385e77..49a788def2854 100644 --- a/doc/source/overview.rst +++ b/doc/source/overview.rst @@ -18,7 +18,7 @@ Package overview * Input/Output tools: loading tabular data from flat files (CSV, delimited, Excel 2003), and saving and loading pandas objects from the fast and efficient PyTables/HDF5 format. - * Memory-efficent "sparse" versions of the standard data structures for storing + * Memory-efficient "sparse" versions of the standard data structures for storing data that is mostly missing or mostly constant (some fixed value) * Moving window statistics (rolling mean, rolling standard deviation, etc.) * Static and moving window linear and `panel regression diff --git a/doc/source/release.rst b/doc/source/release.rst index e490cb330a497..9dc96219f42d9 100644 --- a/doc/source/release.rst +++ b/doc/source/release.rst @@ -301,8 +301,8 @@ Improvements to existing features limit precision based on the values in the array (:issue:`3401`) - ``pd.show_versions()`` is now available for convenience when reporting issues. - perf improvements to Series.str.extract (:issue:`5944`) -- perf improvments in ``dtypes/ftypes`` methods (:issue:`5968`) -- perf improvments in indexing with object dtypes (:issue:`5968`) +- perf improvements in ``dtypes/ftypes`` methods (:issue:`5968`) +- perf improvements in indexing with object dtypes (:issue:`5968`) - improved dtype inference for ``timedelta`` like passed to constructors (:issue:`5458`, :issue:`5689`) - escape special characters when writing to latex (:issue: `5374`) - perf improvements in ``DataFrame.apply`` (:issue:`6013`) @@ -329,7 +329,7 @@ Bug Fixes - Bug in groupby dtype conversion with datetimelike (:issue:`5869`) - Regression in handling of empty Series as indexers to Series (:issue:`5877`) - Bug in internal caching, related to (:issue:`5727`) -- Testing bug in reading json/msgpack from a non-filepath on windows under py3 (:issue:`5874`) +- Testing bug in reading JSON/msgpack from a non-filepath on windows under py3 (:issue:`5874`) - Bug when assigning to .ix[tuple(...)] (:issue:`5896`) - Bug in fully reindexing a Panel (:issue:`5905`) - Bug in idxmin/max with object dtypes (:issue:`5914`) @@ -337,7 +337,7 @@ Bug Fixes - Bug in assigning to chained series with a series via ix (:issue:`5928`) - Bug in creating an empty DataFrame, copying, then assigning (:issue:`5932`) - Bug in DataFrame.tail with empty frame (:issue:`5846`) -- Bug in propogating metadata on ``resample`` (:issue:`5862`) +- Bug in propagating metadata on ``resample`` (:issue:`5862`) - Fixed string-representation of ``NaT`` to be "NaT" (:issue:`5708`) - Fixed string-representation for Timestamp to show nanoseconds if present (:issue:`5912`) - ``pd.match`` not returning passed sentinel @@ -638,7 +638,7 @@ API Changes - support ``timedelta64[ns]`` as a serialization type (:issue:`3577`) - store `datetime.date` objects as ordinals rather then timetuples to avoid timezone issues (:issue:`2852`), thanks @tavistmorph and @numpand - - ``numexpr`` 2.2.2 fixes incompatiblity in PyTables 2.4 (:issue:`4908`) + - ``numexpr`` 2.2.2 fixes incompatibility in PyTables 2.4 (:issue:`4908`) - ``flush`` now accepts an ``fsync`` parameter, which defaults to ``False`` (:issue:`5364`) - ``unicode`` indices not supported on ``table`` formats (:issue:`5386`) @@ -649,7 +649,7 @@ API Changes Options are seconds, milliseconds, microseconds and nanoseconds. (:issue:`4362`, :issue:`4498`). - added ``default_handler`` parameter to allow a callable to be passed - which will be responsible for handling otherwise unserialisable objects. + which will be responsible for handling otherwise unserialiable objects. (:issue:`5138`) - ``Index`` and ``MultiIndex`` changes (:issue:`4039`): @@ -723,7 +723,7 @@ API Changes ``SparsePanel``, etc.), now support the entire set of arithmetic operators and arithmetic flex methods (add, sub, mul, etc.). ``SparsePanel`` does not support ``pow`` or ``mod`` with non-scalars. (:issue:`3765`) -- Arithemtic func factories are now passed real names (suitable for using +- Arithmetic func factories are now passed real names (suitable for using with super) (:issue:`5240`) - Provide numpy compatibility with 1.7 for a calling convention like ``np.prod(pandas_object)`` as numpy call with additional keyword args @@ -802,7 +802,7 @@ See :ref:`Internal Refactoring` - ``swapaxes`` on a ``Panel`` with the same axes specified now return a copy - support attribute access for setting - - ``filter`` supports same api as original ``DataFrame`` filter + - ``filter`` supports same API as original ``DataFrame`` filter - ``fillna`` refactored to ``core/generic.py``, while > 3ndim is ``NotImplemented`` @@ -836,7 +836,7 @@ See :ref:`Internal Refactoring` - added ``ftypes`` method to Series/DataFame, similar to ``dtypes``, but indicates if the underlying is sparse/dense (as well as the dtype) - All ``NDFrame`` objects now have a ``_prop_attributes``, which can be used - to indcated various values to propogate to a new object from an existing + to indicate various values to propagate to a new object from an existing (e.g. name in ``Series`` will follow more automatically now) - Internal type checking is now done via a suite of generated classes, allowing ``isinstance(value, klass)`` without having to directly import the @@ -855,7 +855,7 @@ See :ref:`Internal Refactoring` elements (:issue:`1903`) - Refactor ``clip`` methods to core/generic.py (:issue:`4798`) - Refactor of ``_get_numeric_data/_get_bool_data`` to core/generic.py, - allowing Series/Panel functionaility + allowing Series/Panel functionality - Refactor of Series arithmetic with time-like objects (datetime/timedelta/time etc.) into a separate, cleaned up wrapper class. (:issue:`4613`) @@ -927,7 +927,7 @@ Bug Fixes as the docstring says (:issue:`4362`). - ``as_index`` is no longer ignored when doing groupby apply (:issue:`4648`, :issue:`3417`) -- JSON NaT handling fixed, NaTs are now serialised to `null` (:issue:`4498`) +- JSON NaT handling fixed, NaTs are now serialized to `null` (:issue:`4498`) - Fixed JSON handling of escapable characters in JSON object keys (:issue:`4593`) - Fixed passing ``keep_default_na=False`` when ``na_values=None`` @@ -1086,7 +1086,7 @@ Bug Fixes - Fix a bug where reshaping a ``Series`` to its own shape raised ``TypeError`` (:issue:`4554`) and other reshaping issues. - Bug in setting with ``ix/loc`` and a mixed int/string index (:issue:`4544`) -- Make sure series-series boolean comparions are label based (:issue:`4947`) +- Make sure series-series boolean comparisons are label based (:issue:`4947`) - Bug in multi-level indexing with a Timestamp partial indexer (:issue:`4294`) - Tests/fix for multi-index construction of an all-nan frame (:issue:`4078`) @@ -1096,7 +1096,7 @@ Bug Fixes ordering of returned tables (:issue:`4770`, :issue:`5029`). - Fixed a bug where :func:`~pandas.read_html` was incorrectly parsing when passed ``index_col=0`` (:issue:`5066`). -- Fixed a bug where :func:`~pandas.read_html` was incorrectly infering the +- Fixed a bug where :func:`~pandas.read_html` was incorrectly inferring the type of headers (:issue:`5048`). - Fixed a bug where ``DatetimeIndex`` joins with ``PeriodIndex`` caused a stack overflow (:issue:`3899`). @@ -1203,7 +1203,7 @@ New Features - Added support for writing in ``to_csv`` and reading in ``read_csv``, multi-index columns. The ``header`` option in ``read_csv`` now accepts a list of the rows from which to read the index. Added the option, - ``tupleize_cols`` to provide compatiblity for the pre 0.12 behavior of + ``tupleize_cols`` to provide compatibility for the pre 0.12 behavior of writing and reading multi-index columns via a list of tuples. The default in 0.12 is to write lists of tuples and *not* interpret list of tuples as a multi-index column. @@ -1250,7 +1250,7 @@ Improvements to existing features :issue:`3572`, :issue:`3911`, :issue:`3912`), but they will try to convert object arrays to numeric arrays if possible so that you can still plot, for example, an object array with floats. This happens before any drawing takes place which - elimnates any spurious plots from showing up. + eliminates any spurious plots from showing up. - Added Faq section on repr display options, to help users customize their setup. - ``where`` operations that result in block splitting are much faster (:issue:`3733`) - Series and DataFrame hist methods now take a ``figsize`` argument (:issue:`3834`) @@ -1258,7 +1258,7 @@ Improvements to existing features operations (:issue:`3877`) - Add ``unit`` keyword to ``Timestamp`` and ``to_datetime`` to enable passing of integers or floats that are in an epoch unit of ``D, s, ms, us, ns``, thanks @mtkini (:issue:`3969`) - (e.g. unix timestamps or epoch ``s``, with fracional seconds allowed) (:issue:`3540`) + (e.g. unix timestamps or epoch ``s``, with fractional seconds allowed) (:issue:`3540`) - DataFrame corr method (spearman) is now cythonized. - Improved ``network`` test decorator to catch ``IOError`` (and therefore ``URLError`` as well). Added ``with_connectivity_check`` decorator to allow @@ -1296,7 +1296,7 @@ API Changes ``timedelta64[ns]`` to ``object/int`` (:issue:`3425`) - The behavior of ``datetime64`` dtypes has changed with respect to certain so-called reduction operations (:issue:`3726`). The following operations now - raise a ``TypeError`` when perfomed on a ``Series`` and return an *empty* + raise a ``TypeError`` when performed on a ``Series`` and return an *empty* ``Series`` when performed on a ``DataFrame`` similar to performing these operations on, for example, a ``DataFrame`` of ``slice`` objects: - sum, prod, mean, std, var, skew, kurt, corr, and cov @@ -1335,7 +1335,7 @@ API Changes deprecated - set FutureWarning to require data_source, and to replace year/month with expiry date in pandas.io options. This is in preparation to add options - data from google (:issue:`3822`) + data from Google (:issue:`3822`) - the ``method`` and ``axis`` arguments of ``DataFrame.replace()`` are deprecated - Implement ``__nonzero__`` for ``NDFrame`` objects (:issue:`3691`, :issue:`3696`) @@ -1452,13 +1452,13 @@ Bug Fixes their first argument (:issue:`3702`) - Fix file tokenization error with \r delimiter and quoted fields (:issue:`3453`) - Groupby transform with item-by-item not upcasting correctly (:issue:`3740`) -- Incorrectly read a HDFStore multi-index Frame witha column specification (:issue:`3748`) +- Incorrectly read a HDFStore multi-index Frame with a column specification (:issue:`3748`) - ``read_html`` now correctly skips tests (:issue:`3741`) - PandasObjects raise TypeError when trying to hash (:issue:`3882`) - Fix incorrect arguments passed to concat that are not list-like (e.g. concat(df1,df2)) (:issue:`3481`) - Correctly parse when passed the ``dtype=str`` (or other variable-len string dtypes) in ``read_csv`` (:issue:`3795`) -- Fix index name not propogating when using ``loc/ix`` (:issue:`3880`) +- Fix index name not propagating when using ``loc/ix`` (:issue:`3880`) - Fix groupby when applying a custom function resulting in a returned DataFrame was not converting dtypes (:issue:`3911`) - Fixed a bug where ``DataFrame.replace`` with a compiled regular expression @@ -1468,7 +1468,7 @@ Bug Fixes - Indexing with a string with seconds resolution not selecting from a time index (:issue:`3925`) - csv parsers would loop infinitely if ``iterator=True`` but no ``chunksize`` was specified (:issue:`3967`), python parser failing with ``chunksize=1`` -- Fix index name not propogating when using ``shift`` +- Fix index name not propagating when using ``shift`` - Fixed dropna=False being ignored with multi-index stack (:issue:`3997`) - Fixed flattening of columns when renaming MultiIndex columns DataFrame (:issue:`4004`) - Fix ``Series.clip`` for datetime series. NA/NaN threshold values will now throw ValueError (:issue:`3996`) @@ -1523,17 +1523,17 @@ New Features - New documentation section, ``10 Minutes to Pandas`` - New documentation section, ``Cookbook`` -- Allow mixed dtypes (e.g ``float32/float64/int32/int16/int8``) to coexist in DataFrames and propogate in operations +- Allow mixed dtypes (e.g ``float32/float64/int32/int16/int8``) to coexist in DataFrames and propagate in operations - Add function to pandas.io.data for retrieving stock index components from Yahoo! finance (:issue:`2795`) - Support slicing with time objects (:issue:`2681`) - Added ``.iloc`` attribute, to support strict integer based indexing, analogous to ``.ix`` (:issue:`2922`) -- Added ``.loc`` attribute, to support strict label based indexing, analagous to ``.ix`` (:issue:`3053`) +- Added ``.loc`` attribute, to support strict label based indexing, analogous to ``.ix`` (:issue:`3053`) - Added ``.iat`` attribute, to support fast scalar access via integers (replaces ``iget_value/iset_value``) - Added ``.at`` attribute, to support fast scalar access via labels (replaces ``get_value/set_value``) -- Moved functionaility from ``irow,icol,iget_value/iset_value`` to ``.iloc`` indexer (via ``_ixs`` methods in each object) +- Moved functionality from ``irow,icol,iget_value/iset_value`` to ``.iloc`` indexer (via ``_ixs`` methods in each object) - Added support for expression evaluation using the ``numexpr`` library - Added ``convert=boolean`` to ``take`` routines to translate negative indices to positive, defaults to True -- Added to_series() method to indices, to facilitate the creation of indexeres (:issue:`3275`) +- Added to_series() method to indices, to facilitate the creation of indexers (:issue:`3275`) Improvements to existing features ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -1760,7 +1760,7 @@ Bug Fixes - Fixed a bug in the legend of plotting.andrews_curves() (:issue:`3278`) - Produce a series on apply if we only generate a singular series and have a simple index (:issue:`2893`) -- Fix Python ascii file parsing when integer falls outside of floating point +- Fix Python ASCII file parsing when integer falls outside of floating point spacing (:issue:`3258`) - fixed pretty priniting of sets (:issue:`3294`) - Panel() and Panel.from_dict() now respects ordering when give OrderedDict (:issue:`3303`) @@ -1783,7 +1783,7 @@ pandas 0.10.1 New Features ~~~~~~~~~~~~ -- Add data inferface to World Bank WDI pandas.io.wb (:issue:`2592`) +- Add data interface to World Bank WDI pandas.io.wb (:issue:`2592`) API Changes ~~~~~~~~~~~ @@ -1822,7 +1822,7 @@ Improvements to existing features - added method ``copy`` to copy an existing store (and possibly upgrade) - show the shape of the data on disk for non-table stores when printing the store - - added ability to read PyTables flavor tables (allows compatiblity to + - added ability to read PyTables flavor tables (allows compatibility to other HDF5 systems) - Add ``logx`` option to DataFrame/Series.plot (:issue:`2327`, :issue:`2565`) @@ -1837,7 +1837,7 @@ Improvements to existing features - Add methods ``neg`` and ``inv`` to Series - Implement ``kind`` option in ``ExcelFile`` to indicate whether it's an XLS or XLSX file (:issue:`2613`) -- Documented a fast-path in pd.read_Csv when parsing iso8601 datetime strings +- Documented a fast-path in pd.read_csv when parsing iso8601 datetime strings yielding as much as a 20x speedup. (:issue:`5993`) @@ -1955,7 +1955,7 @@ New Features Experimental Features ~~~~~~~~~~~~~~~~~~~~~ -- Add support for Panel4D, a named 4 Dimensional stucture +- Add support for Panel4D, a named 4 Dimensional structure - Add support for ndpanel factory functions, to create custom, domain-specific N-Dimensional containers @@ -2008,7 +2008,7 @@ Improvements to existing features - Add ``normalize`` option to Series/DataFrame.asfreq (:issue:`2137`) - SparseSeries and SparseDataFrame construction from empty and scalar values now no longer create dense ndarrays unnecessarily (:issue:`2322`) -- ``HDFStore`` now supports hierarchial keys (:issue:`2397`) +- ``HDFStore`` now supports hierarchical keys (:issue:`2397`) - Support multiple query selection formats for ``HDFStore tables`` (:issue:`1996`) - Support ``del store['df']`` syntax to delete HDFStores - Add multi-dtype support for ``HDFStore tables`` @@ -2077,7 +2077,7 @@ Bug Fixes - Fix DataFrame row indexing case with MultiIndex (:issue:`2314`) - Fix to_excel exporting issues with Timestamp objects in index (:issue:`2294`) - Fixes assigning scalars and array to hierarchical column chunk (:issue:`1803`) -- Fixed a UnicdeDecodeError with series tidy_repr (:issue:`2225`) +- Fixed a UnicodeDecodeError with series tidy_repr (:issue:`2225`) - Fixed issued with duplicate keys in an index (:issue:`2347`, :issue:`2380`) - Fixed issues re: Hash randomization, default on starting w/ py3.3 (:issue:`2331`) - Fixed issue with missing attributes after loading a pickled dataframe (:issue:`2431`) @@ -2783,7 +2783,7 @@ Bug Fixes (:issue:`1013`) - DataFrame.plot(logy=True) has no effect (:issue:`1011`). - Broken arithmetic operations between SparsePanel-Panel (:issue:`1015`) -- Unicode repr issues in MultiIndex with non-ascii characters (:issue:`1010`) +- Unicode repr issues in MultiIndex with non-ASCII characters (:issue:`1010`) - DataFrame.lookup() returns inconsistent results if exact match not present (:issue:`1001`) - DataFrame arithmetic operations not treating None as NA (:issue:`992`) @@ -2794,7 +2794,7 @@ Bug Fixes - DataFrame.plot(kind='bar') ignores color argument (:issue:`958`) - Inconsistent Index comparison results (:issue:`948`) - Improper int dtype DataFrame construction from data with NaN (:issue:`846`) -- Removes default 'result' name in grouby results (:issue:`995`) +- Removes default 'result' name in groupby results (:issue:`995`) - DataFrame.from_records no longer mutate input columns (:issue:`975`) - Use Index name when grouping by it (:issue:`1313`) @@ -3866,7 +3866,7 @@ pandas 0.4.1 **Release date:** 9/25/2011 is is primarily a bug fix release but includes some new features and -provements +improvements New Features ~~~~~~~~~~~~ diff --git a/doc/source/rplot.rst b/doc/source/rplot.rst index cdecee39d8d1e..46b57cea2d9ed 100644 --- a/doc/source/rplot.rst +++ b/doc/source/rplot.rst @@ -99,7 +99,7 @@ The plot above shows that it is possible to have two or more plots for the same @savefig rplot4_tips.png plot.render(plt.gcf()) -Above is a similar plot but with 2D kernel desnity estimation plot superimposed. +Above is a similar plot but with 2D kernel density estimation plot superimposed. .. ipython:: python diff --git a/doc/source/timeseries.rst b/doc/source/timeseries.rst index 76bc796beced8..cbfb20c6f9d7d 100644 --- a/doc/source/timeseries.rst +++ b/doc/source/timeseries.rst @@ -379,9 +379,9 @@ We are stopping on the included end-point as its part of the index Datetime Indexing ~~~~~~~~~~~~~~~~~ -Indexing a ``DateTimeIndex`` with a partial string depends on the "accuracy" of the period, in other words how specific the interval is in relation to the frequency of the index. In contrast, indexing with datetime objects is exact, because the objects have exact meaning. These also follow the sematics of *including both endpoints*. +Indexing a ``DateTimeIndex`` with a partial string depends on the "accuracy" of the period, in other words how specific the interval is in relation to the frequency of the index. In contrast, indexing with datetime objects is exact, because the objects have exact meaning. These also follow the semantics of *including both endpoints*. -These ``datetime`` objects are specific ``hours, minutes,`` and ``seconds`` even though they were not explicity specified (they are ``0``). +These ``datetime`` objects are specific ``hours, minutes,`` and ``seconds`` even though they were not explicitly specified (they are ``0``). .. ipython:: python @@ -1460,7 +1460,7 @@ Series of timedeltas with ``NaT`` values are supported y = s - s.shift() y -Elements can be set to ``NaT`` using ``np.nan`` analagously to datetimes +Elements can be set to ``NaT`` using ``np.nan`` analogously to datetimes .. ipython:: python diff --git a/doc/source/visualization.rst b/doc/source/visualization.rst index 630e40c4ebfa2..69e04483cb47d 100644 --- a/doc/source/visualization.rst +++ b/doc/source/visualization.rst @@ -317,7 +317,7 @@ The return type of ``boxplot`` depends on two keyword arguments: ``by`` and ``re When ``by`` is ``None``: * if ``return_type`` is ``'dict'``, a dictionary containing the :class:`matplotlib Lines ` is returned. The keys are "boxes", "caps", "fliers", "medians", and "whiskers". - This is the deafult. + This is the default. * if ``return_type`` is ``'axes'``, a :class:`matplotlib Axes ` containing the boxplot is returned. * if ``return_type`` is ``'both'`` a namedtuple containging the :class:`matplotlib Axes ` and :class:`matplotlib Lines ` is returned @@ -763,7 +763,7 @@ layout and formatting of the returned plot: plt.figure(); ts.plot(style='k--', label='Series'); For each kind of plot (e.g. `line`, `bar`, `scatter`) any additional arguments -keywords are passed alogn to the corresponding matplotlib function +keywords are passed along to the corresponding matplotlib function (:meth:`ax.plot() `, :meth:`ax.bar() `, :meth:`ax.scatter() `). These can be used