Skip to content

Commit cdfae81

Browse files
committed
Merge pull request #4888 from cpcloud/readme-fixup
CLN: move README.rst to markdown
2 parents ef5ca56 + 0bc568f commit cdfae81

File tree

2 files changed

+213
-199
lines changed

2 files changed

+213
-199
lines changed

README.md

+213
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,213 @@
1+
# pandas: powerful Python data analysis toolkit
2+
3+
![Travis-CI Build Status](https://travis-ci.org/pydata/pandas.png)
4+
5+
## What is it
6+
**pandas** is a Python package providing fast, flexible, and expressive data
7+
structures designed to make working with "relational" or "labeled" data both
8+
easy and intuitive. It aims to be the fundamental high-level building block for
9+
doing practical, **real world** data analysis in Python. Additionally, it has
10+
the broader goal of becoming **the most powerful and flexible open source data
11+
analysis / manipulation tool available in any language**. It is already well on
12+
its way toward this goal.
13+
14+
## Main Features
15+
Here are just a few of the things that pandas does well:
16+
17+
- Easy handling of [**missing data**][missing-data] (represented as
18+
`NaN`) in floating point as well as non-floating point data
19+
- Size mutability: columns can be [**inserted and
20+
deleted**][insertion-deletion] from DataFrame and higher dimensional
21+
objects
22+
- Automatic and explicit [**data alignment**][alignment]: objects can
23+
be explicitly aligned to a set of labels, or the user can simply
24+
ignore the labels and let `Series`, `DataFrame`, etc. automatically
25+
align the data for you in computations
26+
- Powerful, flexible [**group by**][groupby] functionality to perform
27+
split-apply-combine operations on data sets, for both aggregating
28+
and transforming data
29+
- Make it [**easy to convert**][conversion] ragged,
30+
differently-indexed data in other Python and NumPy data structures
31+
into DataFrame objects
32+
- Intelligent label-based [**slicing**][slicing], [**fancy
33+
indexing**][fancy-indexing], and [**subsetting**][subsetting] of
34+
large data sets
35+
- Intuitive [**merging**][merging] and [**joining**][joining] data
36+
sets
37+
- Flexible [**reshaping**][reshape] and [**pivoting**][pivot-table] of
38+
data sets
39+
- [**Hierarchical**][mi] labeling of axes (possible to have multiple
40+
labels per tick)
41+
- Robust IO tools for loading data from [**flat files**][flat-files]
42+
(CSV and delimited), [**Excel files**][excel], [**databases**][db],
43+
and saving/loading data from the ultrafast [**HDF5 format**][hdfstore]
44+
- [**Time series**][timeseries]-specific functionality: date range
45+
generation and frequency conversion, moving window statistics,
46+
moving window linear regressions, date shifting and lagging, etc.
47+
48+
49+
[missing-data]: http://pandas.pydata.org/pandas-docs/stable/missing_data.html#working-with-missing-data
50+
[insertion-deletion]: http://pandas.pydata.org/pandas-docs/stable/dsintro.html#column-selection-addition-deletion
51+
[alignment]: http://pandas.pydata.org/pandas-docs/stable/dsintro.html?highlight=alignment#intro-to-data-structures
52+
[groupby]: http://pandas.pydata.org/pandas-docs/stable/groupby.html#group-by-split-apply-combine
53+
[conversion]: http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe
54+
[slicing]: http://pandas.pydata.org/pandas-docs/stable/indexing.html#slicing-ranges
55+
[fancy-indexing]: http://pandas.pydata.org/pandas-docs/stable/indexing.html#advanced-indexing-with-ix
56+
[subsetting]: http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing
57+
[merging]: http://pandas.pydata.org/pandas-docs/stable/merging.html#database-style-dataframe-joining-merging
58+
[joining]: http://pandas.pydata.org/pandas-docs/stable/merging.html#joining-on-index
59+
[reshape]: http://pandas.pydata.org/pandas-docs/stable/reshaping.html#reshaping-and-pivot-tables
60+
[pivot-table]: http://pandas.pydata.org/pandas-docs/stable/reshaping.html#pivot-tables-and-cross-tabulations
61+
[mi]: http://pandas.pydata.org/pandas-docs/stable/indexing.html#hierarchical-indexing-multiindex
62+
[flat-files]: http://pandas.pydata.org/pandas-docs/stable/io.html#csv-text-files
63+
[excel]: http://pandas.pydata.org/pandas-docs/stable/io.html#excel-files
64+
[db]: http://pandas.pydata.org/pandas-docs/stable/io.html#sql-queries
65+
[hdfstore]: http://pandas.pydata.org/pandas-docs/stable/io.html#hdf5-pytables
66+
[timeseries]: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#time-series-date-functionality
67+
68+
## Where to get it
69+
The source code is currently hosted on GitHub at:
70+
http://github.com/pydata/pandas
71+
72+
Binary installers for the latest released version are available at the Python
73+
package index
74+
75+
http://pypi.python.org/pypi/pandas/
76+
77+
And via `easy_install`:
78+
79+
```sh
80+
easy_install pandas
81+
```
82+
83+
or `pip`:
84+
85+
```sh
86+
pip install pandas
87+
```
88+
89+
## Dependencies
90+
- [NumPy](http://www.numpy.org): 1.6.1 or higher
91+
- [python-dateutil](http://labix.org/python-dateutil): 1.5 or higher
92+
- [pytz](http://pytz.sourceforge.net)
93+
- Needed for time zone support with ``pandas.date_range``
94+
95+
### Highly Recommended Dependencies
96+
- [numexpr](http://code.google.com/p/numexpr/)
97+
- Needed to accelerate some expression evaluation operations
98+
- Required by PyTables
99+
- [bottleneck](http://berkeleyanalytics.com/bottleneck)
100+
- Needed to accelerate certain numerical operations
101+
102+
### Optional dependencies
103+
- [Cython](http://www.cython.org): Only necessary to build development version. Version 0.17.1 or higher.
104+
- [SciPy](http://www.scipy.org): miscellaneous statistical functions
105+
- [PyTables](http://www.pytables.org): necessary for HDF5-based storage
106+
- [matplotlib](http://matplotlib.sourceforge.net/): for plotting
107+
- [statsmodels](http://statsmodels.sourceforge.net/)
108+
- Needed for parts of `pandas.stats`
109+
- [openpyxl](http://packages.python.org/openpyxl/), [xlrd/xlwt](http://www.python-excel.org/)
110+
- openpyxl version 1.6.1 or higher, for writing .xlsx files
111+
- xlrd >= 0.9.0
112+
- Needed for Excel I/O
113+
- [boto](https://pypi.python.org/pypi/boto): necessary for Amazon S3 access.
114+
- One of the following combinations of libraries is needed to use the
115+
top-level [`pandas.read_html`][read-html-docs] function:
116+
- [BeautifulSoup4][BeautifulSoup4] and [html5lib][html5lib] (Any
117+
recent version of [html5lib][html5lib] is okay.)
118+
- [BeautifulSoup4][BeautifulSoup4] and [lxml][lxml]
119+
- [BeautifulSoup4][BeautifulSoup4] and [html5lib][html5lib] and [lxml][lxml]
120+
- Only [lxml][lxml], although see [HTML reading gotchas][html-gotchas]
121+
for reasons as to why you should probably **not** take this approach.
122+
123+
#### Notes about HTML parsing libraries
124+
- If you install [BeautifulSoup4][BeautifulSoup4] you must install
125+
either [lxml][lxml] or [html5lib][html5lib] or both.
126+
`pandas.read_html` will **not** work with *only* `BeautifulSoup4`
127+
installed.
128+
- You are strongly encouraged to read [HTML reading
129+
gotchas][html-gotchas]. It explains issues surrounding the
130+
installation and usage of the above three libraries.
131+
- You may need to install an older version of
132+
[BeautifulSoup4][BeautifulSoup4]:
133+
- Versions 4.2.1, 4.1.3 and 4.0.2 have been confirmed for 64 and
134+
32-bit Ubuntu/Debian
135+
- Additionally, if you're using [Anaconda][Anaconda] you should
136+
definitely read [the gotchas about HTML parsing][html-gotchas]
137+
libraries
138+
- If you're on a system with `apt-get` you can do
139+
140+
```sh
141+
sudo apt-get build-dep python-lxml
142+
```
143+
144+
to get the necessary dependencies for installation of [lxml][lxml].
145+
This will prevent further headaches down the line.
146+
147+
[html5lib]: https://github.com/html5lib/html5lib-python "html5lib"
148+
[BeautifulSoup4]: http://www.crummy.com/software/BeautifulSoup "BeautifulSoup4"
149+
[lxml]: http://lxml.de
150+
[Anaconda]: https://store.continuum.io/cshop/anaconda
151+
[NumPy]: http://numpy.scipy.org/
152+
[html-gotchas]: http://pandas.pydata.org/pandas-docs/stable/gotchas.html#html-table-parsing
153+
[read-html-docs]: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.html.read_html.html#pandas.io.html.read_html
154+
155+
## Installation from sources
156+
To install pandas from source you need Cython in addition to the normal
157+
dependencies above. Cython can be installed from pypi:
158+
159+
```sh
160+
pip install cython
161+
```
162+
163+
In the `pandas` directory (same one where you found this file after
164+
cloning the git repo), execute:
165+
166+
```sh
167+
python setup.py install
168+
```
169+
170+
or for installing in [development mode](http://www.pip-installer.org/en/latest/usage.html):
171+
172+
```sh
173+
python setup.py develop
174+
```
175+
176+
Alternatively, you can use `pip` if you want all the dependencies pulled
177+
in automatically (the `-e` option is for installing it in [development
178+
mode](http://www.pip-installer.org/en/latest/usage.html)):
179+
180+
```sh
181+
pip install -e .
182+
```
183+
184+
On Windows, you will need to install MinGW and execute:
185+
186+
```sh
187+
python setup.py build --compiler=mingw32
188+
python setup.py install
189+
```
190+
191+
See http://pandas.pydata.org/ for more information.
192+
193+
## License
194+
BSD
195+
196+
## Documentation
197+
The official documentation is hosted on PyData.org: http://pandas.pydata.org/
198+
199+
The Sphinx documentation should provide a good starting point for learning how
200+
to use the library. Expect the docs to continue to expand as time goes on.
201+
202+
## Background
203+
Work on ``pandas`` started at AQR (a quantitative hedge fund) in 2008 and
204+
has been under active development since then.
205+
206+
## Discussion and Development
207+
Since pandas development is related to a number of other scientific
208+
Python projects, questions are welcome on the scipy-user mailing
209+
list. Specialized discussions or design issues should take place on
210+
the pystatsmodels mailing list / Google group, where
211+
``scikits.statsmodels`` and other libraries will also be discussed:
212+
213+
http://groups.google.com/group/pystatsmodels

0 commit comments

Comments
 (0)