Skip to content

Commit 0ad2c0d

Browse files
ms041223mroeschke
andauthored
DOC: Adding ArcticDB to the ecosystem.md page (pandas-dev#59830)
* adding ArcticDB to the ecosystem.md page * Update web/pandas/community/ecosystem.md Co-authored-by: Matthew Roeschke <[email protected]> * making pandas lower case --------- Co-authored-by: Matthew Roeschke <[email protected]>
1 parent a851438 commit 0ad2c0d

File tree

1 file changed

+91
-0
lines changed

1 file changed

+91
-0
lines changed

web/pandas/community/ecosystem.md

+91
Original file line numberDiff line numberDiff line change
@@ -367,6 +367,97 @@ pandas-gbq provides high performance reads and writes to and from
367367
these methods were exposed as `pandas.read_gbq` and `DataFrame.to_gbq`.
368368
Use `pandas_gbq.read_gbq` and `pandas_gbq.to_gbq`, instead.
369369

370+
371+
### [ArcticDB](https://github.com/man-group/ArcticDB)
372+
373+
ArcticDB is a serverless DataFrame database engine designed for the Python Data Science ecosystem. ArcticDB enables you to store, retrieve, and process pandas DataFrames at scale. It is a storage engine designed for object storage and also supports local-disk storage using LMDB. ArcticDB requires zero additional infrastructure beyond a running Python environment and access to object storage and can be installed in seconds. Please find full documentation [here](https://docs.arcticdb.io/latest/).
374+
375+
#### ArcticDB Terminology
376+
377+
ArcticDB is structured to provide a scalable and efficient way to manage and retrieve DataFrames, organized into several key components:
378+
379+
- `Object Store` Collections of libraries. Used to separate logical environments from each other. Analogous to a database server.
380+
- `Library` Contains multiple symbols which are grouped in a certain way (different users, markets, etc). Analogous to a database.
381+
- `Symbol` Atomic unit of data storage. Identified by a string name. Data stored under a symbol strongly resembles a pandas DataFrame. Analogous to tables.
382+
- `Version` Every modifying action (write, append, update) performed on a symbol creates a new version of that object.
383+
384+
#### Installation
385+
386+
To install, simply run:
387+
388+
```console
389+
pip install arcticdb
390+
```
391+
392+
To get started, we can import ArcticDB and instantiate it:
393+
394+
```python
395+
import arcticdb as adb
396+
import numpy as np
397+
import pandas as pd
398+
# this will set up the storage using the local file system
399+
arctic = adb.Arctic("lmdb://arcticdb_test")
400+
```
401+
402+
> **Note:** ArcticDB supports any S3 API compatible storage, including AWS. ArcticDB also supports Azure Blob storage.
403+
> ArcticDB also supports LMDB for local/file based storage - to use LMDB, pass an LMDB path as the URI: `adb.Arctic('lmdb://path/to/desired/database')`.
404+
405+
#### Library Setup
406+
407+
ArcticDB is geared towards storing many (potentially millions) of tables. Individual tables (DataFrames) are called symbols and are stored in collections called libraries. A single library can store many symbols. Libraries must first be initialized prior to use:
408+
409+
```python
410+
lib = arctic.get_library('sample', create_if_missing=True)
411+
```
412+
413+
#### Writing Data to ArcticDB
414+
415+
Now we have a library set up, we can get to reading and writing data. ArcticDB has a set of simple functions for DataFrame storage. Let's write a DataFrame to storage.
416+
417+
```python
418+
df = pd.DataFrame(
419+
{
420+
"a": list("abc"),
421+
"b": list(range(1, 4)),
422+
"c": np.arange(3, 6).astype("u1"),
423+
"d": np.arange(4.0, 7.0, dtype="float64"),
424+
"e": [True, False, True],
425+
"f": pd.date_range("20130101", periods=3)
426+
}
427+
)
428+
429+
df
430+
df.dtypes
431+
```
432+
433+
Write to ArcticDB.
434+
435+
```python
436+
write_record = lib.write("test", df)
437+
```
438+
439+
> **Note:** When writing pandas DataFrames, ArcticDB supports the following index types:
440+
>
441+
> - `pandas.Index` containing int64 (or the corresponding dedicated types Int64Index, UInt64Index)
442+
> - `RangeIndex`
443+
> - `DatetimeIndex`
444+
> - `MultiIndex` composed of above supported types
445+
>
446+
> The "row" concept in `head`/`tail` refers to the row number ('iloc'), not the value in the `pandas.Index` ('loc').
447+
448+
#### Reading Data from ArcticDB
449+
450+
Read the data back from storage:
451+
452+
```python
453+
read_record = lib.read("test")
454+
read_record.data
455+
df.dtypes
456+
```
457+
458+
ArcticDB also supports appending, updating, and querying data from storage to a pandas DataFrame. Please find more information [here](https://docs.arcticdb.io/latest/api/query_builder/).
459+
460+
370461
## Out-of-core
371462

372463
### [Bodo](https://bodo.ai/)

0 commit comments

Comments
 (0)