Skip to content

Commit 42f9277

Browse files
authored
Merge pull request pandas-dev#779 from manahl/shashank88-patch-1
Add some more documentation
2 parents 84aa655 + fb2fc9d commit 42f9277

File tree

5 files changed

+113
-15
lines changed

5 files changed

+113
-15
lines changed

docs/contributing.md

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
## Contributing to Arctic Development
2+
3+
* Feel free to pick up an issue from the bug tracker: https://github.com/manahl/arctic/issues or add an issue in general and assign it to yourself so we don't duplicate the work on the same issue.
4+
5+
* Local installation
6+
* Clone the repo locally
7+
* Create a virtualenv eg. `virtualenv .venv -p python3`
8+
* Activate the virtualenv eg. `source .venv/bin/activate`
9+
* Run `python setup.py install` to install dependencies in your virtualenv.
10+
* Arctic should be ready to use locally, you can test it by importing it in your python interpreter
11+
12+
* After you have made changes, you can run tests with `python setup.py test`. You can also do something like: `python setup.py test -a tests/integration/<test_name>` to run a specific test.
13+
14+
* Run pycodestyle locally to make sure it passes the coding style checks.
15+

docs/faq.md

+21-8
Original file line numberDiff line numberDiff line change
@@ -10,19 +10,28 @@ other data types and optional versioning.
1010

1111
Arctic can query millions of rows per second per client, achieves ~10x compression on network bandwidth,
1212
~10x compression on disk, and scales to hundreds of millions of rows per second per
13-
[MongoDB](https://www.mongodb.org/) instance.
13+
[MongoDB](https://www.mongodb.org/) instance.
14+
15+
Other benefits are:-
16+
* Serializes a number of data types eg. Pandas DataFrames, Numpy arrays, Python objects via pickling etc. so you don't have to handle different datatypes manually.
17+
* Uses LZ4 compression by default on the client side to get big savings on network / disk.
18+
* Allows you to version different stages of an object and snapshot the state (In some ways similar to git), and allows you to freely experiment and then just revert back the snapshot. [VersionStore only]
19+
* Does the chunking (breaking a Dataframe to smaller part* for you.
20+
* Adds a concept of Users and per User Libraries which can build on Mongo's auth.
21+
* Has different types of Stores, each with it's own benefits. Eg. Versionstore allows you to version and snapshot stuff, TickStore is for storage and highly efficient retrieval of streaming data, ChunkStore allows you to chunk and efficiently retrieve ranges of chunks. If nothing suits you, feel free to use vanilla Mongo commands with BSONStore.
22+
* Restricts data access to Mongo and thus prevents ad hoc queries on unindexed / unsharded collections
23+
1424

1525
## Differences between VersionStore and TickStore?
1626

17-
tickstore is for constant streams of data, version store is for working with data
18-
(i.e. playing around with it). It keeps versions so you can 'undo' changes and keep
19-
track of updates.
27+
Tickstore is for tick style data generally via streaming, VersionStore is for playing around with data. It keeps versions so you can 'undo' changes and keep track of updates.
2028

2129
## Which Store should I use?
2230

23-
* VersionStore: when ..
24-
* ChunkStore: when ..
25-
* TickStore: when ..
31+
* VersionStore: This is the default Store type. This gives you the ability to Version and Snapshot your objects while doing the serialization, compression etc alongside it. This is useful as you can basically play with your data and revert back to an older state if needed
32+
* ChunkStore: Use ChunkStore when you don't care about versioning, and want to store DataFrames into user defined chunks with fast reads.
33+
* TickStore: When you are storing constant tick data (eg. buy / sell info from exchanges). This generally plays well with Kafka / other message brokers.
34+
* BSONStore: For basically using raw Mongo operations via arctic. Can be used for storing adhoc data.
2635

2736
## Why Mongo?
2837

@@ -32,4 +41,8 @@ chose Mongo as the backend for Arctic.
3241
## I'm running Mongo in XXXX setup - what performance should I expect?
3342
We're constantly asked what the expected performance of Arctic is/should be for given configutations and Mongo cluster setups. Its hard to know for sure given the enormous number of ways Mongo, networks, machines, workstations, etc can be configured. MongoDB performance tuning is outside the scope of this library, but countless tutorials and examples are available via a quick search of the Internet.
3443

35-
... Work in Progress.
44+
45+
## Thread safety
46+
47+
VersionStore is thread safe, and operations that are interrupted should never corrupt the data, based on us writing the data segments first and then the pointers to it. This could leak data in cases though.
48+

docs/index.md

+17-5
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,22 @@
22

33
Arctic is a timeseries / dataframe database that sits atop MongoDB. Arctic supports serialization of a number of datatypes for storage in the mongo document model.
44

5+
## Why use Arctic?
6+
7+
Some of the reasons to use Arctic are:-
8+
9+
* Serializes a number of data types eg. Pandas DataFrames, Numpy arrays, Python objects via pickling etc. so you don't have to handle different datatypes manually.
10+
* Uses LZ4 compression by default on the client side to get big savings on network / disk.
11+
* Allows you to version different stages of an object and snapshot the state (In some ways similar to git), and allows you to freely experiment and then just revert back the snapshot. [VersionStore only]
12+
* Does the chunking (breaking a Dataframe to smaller part* for you.
13+
* Adds a concept of Users and per User Libraries which can build on Mongo's auth.
14+
* Has different types of Stores, each with it's own benefits. Eg. Versionstore allows you to version and snapshot stuff, TickStore is for storage and highly efficient retrieval of streaming data, ChunkStore allows you to chunk and efficiently retrieve ranges of chunks. If nothing suits you, feel free to use vanilla Mongo commands with BSONStore.
15+
* Restricts data access to Mongo and thus prevents ad hoc queries on unindexed / unsharded collections
16+
17+
Head over to the FAQs and James's presentation given below for more details.
18+
19+
## Basic Operations
20+
521
Arctic provides a [wrapper](../arctic/arctic.py) for handling connections to Mongo. The `Arctic` class is what actually connects to Arctic.
622

723
```
@@ -58,11 +74,7 @@ Other basic methods:
5874

5975
* `library.list_symbols()`
6076
- Does what you might expect - lists all the symbols in the given library
61-
```
62-
>>> lib.list_symbols()
63-
64-
['US_EQUITIES', 'EUR_EQUITIES', ...]
65-
```
77+
```['US_EQUITIES', 'EUR_EQUITIES', ...]```
6678
* `arctic.get_quota(library_name)`, `arctic.set_quota(library_name, quota_in_bytes)`
6779
- Arctic internally sets quotas on libraries so they do not consume too much space. You can check and set quotas with these two methods. Note these operate on the `Arctic` object, not on libraries
6880

docs/tickstore.md

+59-1
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,62 @@ like kafka / redis queue etc.
99

1010
## Reading and Writing data with Tickstore
1111

12-
TBD.
12+
Sample tick:
13+
14+
```python
15+
sample_ticks = [
16+
{
17+
'ASK': 1545.25,
18+
'ASKSIZE': 1002.0,
19+
'BID': 1545.0,
20+
'BIDSIZE': 55.0,
21+
'CUMVOL': 2187387.0,
22+
'DELETED_TIME': 0,
23+
'INSTRTYPE': 'FUT',
24+
'PRICE': 1545.0,
25+
'SIZE': 1.0,
26+
'TICK_STATUS': 0,
27+
'TRADEHIGH': 1561.75,
28+
'TRADELOW': 1537.25,
29+
'index': 1185076787070
30+
},
31+
{
32+
'CUMVOL': 354.0,
33+
'DELETED_TIME': 0,
34+
'PRICE': 1543.75,
35+
'SIZE': 354.0,
36+
'TRADEHIGH': 1543.75,
37+
'TRADELOW': 1543.75,
38+
'index': 1185141600600
39+
}
40+
]
41+
42+
```
43+
44+
### Writing and reading to tickstore
45+
46+
tickstore_lib.write('FEED::SYMBOL', sample_ticks)
47+
48+
df = tickstore_lib.read('FEED::SYMBOL', columns=['BID', 'ASK', 'PRICE'])
49+
50+
Another example with datetime index with tz_info
51+
```python
52+
data = [{'A': 120, 'D': 1}, {'A': 122, 'B': 2.0}, {'A': 3, 'B': 3.0, 'D': 1}]
53+
tick_index = [dt(2013, 6, 1, 12, 00, tzinfo=mktz('UTC')),
54+
dt(2013, 6, 1, 11, 00, tzinfo=mktz('UTC')), # Out-of-order
55+
dt(2013, 6, 1, 13, 00, tzinfo=mktz('UTC'))]
56+
data = pd.DataFrame(data, index=tick_index)
57+
58+
tickstore_lib._chunk_size = 3
59+
tickstore_lib.write('SYM', data)
60+
tickstore_lib.read('SYM', columns=None)
61+
```
62+
63+
## Usecases
64+
65+
* Storing billions of ticks in a compressed way with fast querying by date ranges.
66+
* Customizable chunk sizes. The default is 100k, which should fit easily in a single mongo doc for fast reads.
67+
* Structured to work with financial tick data stored on a per symbol basis. Generally used with kafka / redis queue or
68+
some sort of message broker for streaming data.
69+
70+
See [James's talk](https://vimeo.com/showcase/3660528/video/145842301) for more details

mkdocs.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -19,12 +19,12 @@ pages:
1919
- Introduction: 'index.md'
2020
- Quickstart: 'quickstart.md'
2121
- Configuration: 'configuration.md'
22-
- Developing on mac: 'developing-conda-mac.md'
2322
- Storage Engines:
2423
- VersionStore: 'versionstore.md'
2524
- TickStore: 'tickstore.md'
2625
- ChunkStore: 'chunkstore.md'
2726
- ChunkStore API Reference: 'chunkstore_api.md'
27+
- Contributing to Arctic: 'contributing.md'
2828
- Releasing: 'releasing.md'
2929
- Users: 'users.md'
3030
- FAQ: 'faq.md'

0 commit comments

Comments
 (0)