-
Notifications
You must be signed in to change notification settings - Fork 2.6k
DOC-2544: Adding new doctest to support updated VSS article #2886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 3 commits
Commits
Show all changes
114 commits
Select commit
Hold shift + click to select a range
5e258a1
Add missing `Union` type in method `StreamCommands.xclaim()` (#2553)
ant1fact e39c7ba
Simplify the sync SocketBuffer, add type hints (#2543)
kristjanvalur 42604b6
trivial typo fix (#2566)
rbowen 9e6a9b5
Fix unlink in cluster pipeline (#2562)
gmbnomis 428d609
Fix issue 2540: Synchronise concurrent command calls to single-client…
Vivanov98 31a1c0b
Fix: tuple function cannot be passed more than one argument (#2573)
kosuke-zhang ffbe879
Use hiredis::pack_command to serialized the commands. (#2570)
prokazov 9e00b91
Fix issue 2567: NoneType check before raising exception (#2569)
SoulPancake e7306aa
Fix issue 2349: Let async HiredisParser finish parsing after a Connec…
kristjanvalur fcd8f98
Add TS.MGET example for OS Redis Cluster (#2507)
uglide f517287
Fix issue with `pack_commands` returning an empty byte sequence (#2416)
jmcbailey 5cb5712
Version 4.5.0 (#2580)
dvora-h 2b470cb
Fix #2581 UnixDomainSocketConnection' object has no attribute '_comma…
prokazov fd7a79d
Version 4.5.1 (#2586)
dvora-h e9ad2a3
Fix for `lpop` and `rpop` return typing (#2590)
Galtozzy 6c708c2
Update README to make pip install copy-pastable on zsh (#2584)
uglide b546a9a
update json().arrindex() default values (#2611)
davemcphee 5588ae0
Speeding up the protocol parsing (#2596)
chayim 3edd49b
Fixed CredentialsProvider examples (#2587)
barshaul 6d1061f
ConnectionPool SSL example (#2605)
CrimsonGlory a372ba4
[types] update return type of smismember to list[int] (#2617)
ryin1 8bfd492
Making search document subscriptable (#2615)
aksinha334 91ab12a
Remove redundant assignment. (#2620)
thebarbershop 25e85e5
fix: replace async_timeout by asyncio.timeout (#2602)
sileht c61eeb2
Adding supported redis/library details (#2621)
chayim d63313b
add queue_class to REDIS_ALLOWED_KEYS (#2577)
zakaf c871723
pypy-3.9 CI (#2608)
chayim 7d474f9
introduce AbstractConnection so that UnixDomainSocketConnection can c…
woutdenolf 1b2f408
Fix behaviour of async PythonParser to match RedisParser as for issue…
kristjanvalur 318b114
Version 4.5.2 (#2627)
dvora-h 66a4d6b
AsyncIO Race Condition Fix (#2641)
chayim 4802530
fix: do not use asyncio's timeout lib before 3.11.2 (#2659)
bellini666 4856813
UnixDomainSocketConnection missing constructor argument (#2630)
woutdenolf 326bb1c
removing useless files (#2642)
chayim 6d886d7
Fix issue 2660: PytestUnraisableExceptionWarning from asycio client (…
shacharPash 5acbde3
Fixing cancelled async futures (#2666)
chayim ef3f086
Fix async (#2673)
dvora-h e1017fd
Version 4.5.4 (#2674)
dvora-h 7ae8464
Really do not use asyncio's timeout lib before 3.11.2 (#2699)
mirekdlugosz 6a4240b
asyncio: Fix memory leak caused by hiredis (#2693) (#2694)
oranav db9a85c
Update example of Redisearch creating index (#2703)
mzdehbashi-github 7fc4c76
Improving Vector Similarity Search Example (#2661)
tylerhutcherson d6bb457
Fix incorrect usage of once flag in async Sentinel (#2718)
felipou fddd3d6
Fix topk list example. (#2724)
AYMENJD 8e0b84d
Improve error output for master discovery (#2720)
scoopex 8b58ebb
return response in case of KeyError (#2628)
shacharPash bf528fc
Add WITHSCORES to ZREVRANK Command (#2725)
shacharPash 1ca223a
Fix `ClusterCommandProtocol` not itself being marked as a protocol (#…
Avasam ac15d52
Fix potential race condition during disconnection (#2719)
Anthchirp a7857e1
add "address_remap" feature to RedisCluster (#2726)
kristjanvalur e52fd67
nermina changes from NRedisStack (#2736)
shacharPash 6d32503
Updated AWS Elasticache IAM Connection Example (#2702)
NickG123 ffb2b83
pinning urllib3 to fix CI (#2748)
chayim 3748a8b
Add RedisCluster.remap_host_port, Update tests for CWE 404 (#2706)
kristjanvalur 906e413
Update redismodules.rst (#2747)
cristianmatache cfdcfd8
Add support for cluster myshardid (#2704)
SoulPancake 9370711
clean warnings (#2731)
dvora-h 093232d
fix parse_slowlog_get (#2732)
dvora-h c0833f6
Optionally disable disconnects in read_response (#2695)
kristjanvalur 8c06d67
Add client no-touch (#2745)
aciddust 984b733
fix create single_connection_client from url (#2752)
dvora-h 4a4566b
Fix `xadd` allow non negative maxlen (#2739)
aciddust f056118
Version 4.5.5 (#2753)
dvora-h 35b7e09
Kristjan/issue #2754: Add missing argument to SentinelManagedConnecti…
kristjanvalur 2d9b5ac
support JSON.MERGE Command (#2761)
shacharPash db7b9dd
Issue #2749: Remove unnecessary __del__ handlers (#2755)
kristjanvalur d95d8a2
Add WITHSCORE to ZRANK (#2758)
bodevone 4d396f8
Fix JSON.MERGE Summary (#2786)
shacharPash 3cdecc1
Fixed key error in parse_xinfo_stream (#2788)
Smit-Parmar 29dfbb2
insert newline to prevent sphinx from assuming code block (#2796)
bmacphee 2bb7f10
Introduce OutOfMemoryError exception for Redis write command rejectio…
bmacphee 53bed27
Add unit tests for the `connect` method of all Redis connection class…
woutdenolf 4f466d6
Fix dead weakref in sentinel connection causing ReferenceError (#2767…
shahar-lev abc04b5
chore(documentation): fix redirects and some small cleanups (#2801)
vmihailenco cecf78b
Add waitaof (#2760)
aciddust 40a769e
Extract abstract async connection class (#2734)
woutdenolf d25a96b
Fix type hint for retry_on_error in async cluster (#2804)
TheKevJames 04aadd7
Fix CI (#2809)
dvora-h ab617a1
Support JSON.MSET Command (#2766)
shacharPash 9f50357
Version 4.6.0 (#2810)
dvora-h 2732a85
Merge 5.0 to master (#2849)
dvora-h 2c2860d
Change cluster docker to edge and enable debug command (#2853)
chayim 8e5d5ce
Fix socket garbage collection (#2859)
kristjanvalur 471f860
Fixing doc builds (#2869)
chayim a49e656
RESP3 connection examples (#2863)
chayim dc62e19
EOL for Python 3.7 (#2852)
chayim 7d70c91
Fix a duplicate word in `CONTRIBUTING.md` (#2848)
kurtmckee 66bad8e
Add sync modules (except search) tests to cluster CI (#2850)
dvora-h da27f4b
Fix timeout retrying on Redis pipeline execution (#2812)
pall-j 3e50d28
Fix type hints in SearchCommands (#2817)
JoanFM 8370c4a
Add a Dependabot config to auto-update GitHub action versions (#2847)
kurtmckee 38c7de6
Dependabot label change (#2880)
chayim 0ed8077
Bump pypa/gh-action-pip-audit from 1.0.0 to 1.0.8 (#2879)
dependabot[bot] 673617d
Bump actions/upload-artifact from 2 to 3 (#2877)
dependabot[bot] a532f89
Add py.typed in accordance with PEP-561 (#2738)
zmievsa b0abd55
RESP 3 feature documentation (#2872)
chayim d5c2d1d
Adding support for triggered functions (TFUNCTION) (#2861)
shacharPash f121cf2
Add support for `CLIENT SETINFO` (#2857)
dvora-h 2f67926
Version 5.0.0 (#2874)
chayim 4e4ff48
DOC-2544: Adding new doctest to support updated VSS article
dwdougherty 28cc65c
Updating all client licenses to clearly be MIT (#2884)
chayim e680924
DOC-2554: update import order
dwdougherty b3a92c4
DOC-2544: update formatting
dwdougherty b42d19a
DOC-2544: Update import (again)
dwdougherty 724807a
Merge branch 'redis:master' into doc-2544
dwdougherty d23058a
DOC-2544: one more attempt using Chayims advice
dwdougherty b8372bd
lint fixes
chayim 5f50fdc
and a reqs file
chayim 4016a67
another missing requirement
chayim ce0f076
and sentence transformers
chayim d5b42af
and the optional, unlisted dependency tabulate. Thanks conda
chayim 8dde72a
Updating README for doctests howto
chayim 30c1179
align isort with black
chayim 894a4b6
typo
chayim File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,309 @@ | ||
# EXAMPLE: search_vss | ||
# STEP_START imports | ||
import json | ||
import time | ||
|
||
import numpy as np | ||
import pandas as pd | ||
import requests | ||
from sentence_transformers import SentenceTransformer | ||
|
||
import redis | ||
from redis.commands.search.field import ( | ||
NumericField, | ||
TagField, | ||
TextField, | ||
VectorField, | ||
) | ||
from redis.commands.search.indexDefinition import IndexDefinition, IndexType | ||
from redis.commands.search.query import Query | ||
|
||
# STEP_END | ||
|
||
# STEP_START get_data | ||
url = "https://raw.githubusercontent.com/bsbodden/redis_vss_getting_started/main/data/bikes.json" | ||
response = requests.get(url) | ||
bikes = response.json() | ||
# STEP_END | ||
# REMOVE_START | ||
assert bikes[0]["model"] == "Jigger" | ||
# REMOVE_END | ||
|
||
# STEP_START dump_data | ||
json.dumps(bikes[0], indent=2) | ||
# STEP_END | ||
|
||
# STEP_START connect | ||
client = redis.Redis(host="localhost", port=6379, decode_responses=True) | ||
# STEP_END | ||
|
||
# STEP_START connection_test | ||
res = client.ping() | ||
# >>> True | ||
# STEP_END | ||
# REMOVE_START | ||
assert res == True | ||
# REMOVE_END | ||
|
||
# STEP_START load_data | ||
pipeline = client.pipeline() | ||
for i, bike in enumerate(bikes, start=1): | ||
redis_key = f"bikes:{i:03}" | ||
pipeline.json().set(redis_key, "$", bike) | ||
res = pipeline.execute() | ||
# >>> [True, True, True, True, True, True, True, True, True, True, True] | ||
# STEP_END | ||
# REMOVE_START | ||
assert res == [True, True, True, True, True, True, True, True, True, True, True] | ||
# REMOVE_END | ||
|
||
# STEP_START get | ||
res = client.json().get("bikes:010", "$.model") | ||
# >>> ['Summit'] | ||
# STEP_END | ||
# REMOVE_START | ||
assert res == ["Summit"] | ||
# REMOVE_END | ||
|
||
# STEP_START get_keys | ||
keys = sorted(client.keys("bikes:*")) | ||
# >>> ['bikes:001', 'bikes:002', ..., 'bikes:011'] | ||
# STEP_END | ||
# REMOVE_START | ||
assert keys[0] == "bikes:001" | ||
# REMOVE_END | ||
|
||
# STEP_START generate_embeddings | ||
descriptions = client.json().mget(keys, "$.description") | ||
descriptions = [item for sublist in descriptions for item in sublist] | ||
embedder = SentenceTransformer("msmarco-distilbert-base-v4") | ||
embeddings = embedder.encode(descriptions).astype(np.float32).tolist() | ||
VECTOR_DIMENSION = len(embeddings[0]) | ||
# >>> 768 | ||
# STEP_END | ||
# REMOVE_START | ||
assert VECTOR_DIMENSION == 768 | ||
# REMOVE_END | ||
|
||
# STEP_START load_embeddings | ||
pipeline = client.pipeline() | ||
for key, embedding in zip(keys, embeddings): | ||
pipeline.json().set(key, "$.description_embeddings", embedding) | ||
pipeline.execute() | ||
# >>> [True, True, True, True, True, True, True, True, True, True, True] | ||
# STEP_END | ||
|
||
# STEP_START dump_example | ||
res = client.json().get("bikes:010") | ||
# >>> | ||
# { | ||
# "model": "Summit", | ||
# "brand": "nHill", | ||
# "price": 1200, | ||
# "type": "Mountain Bike", | ||
# "specs": { | ||
# "material": "alloy", | ||
# "weight": "11.3" | ||
# }, | ||
# "description": "This budget mountain bike from nHill performs well..." | ||
# "description_embeddings": [ | ||
# -0.538114607334137, | ||
# -0.49465855956077576, | ||
# -0.025176964700222015, | ||
# ... | ||
# ] | ||
# } | ||
# STEP_END | ||
# REMOVE_START | ||
assert len(res["description_embeddings"]) == 768 | ||
# REMOVE_END | ||
|
||
# STEP_START create_index | ||
schema = ( | ||
TextField("$.model", no_stem=True, as_name="model"), | ||
TextField("$.brand", no_stem=True, as_name="brand"), | ||
NumericField("$.price", as_name="price"), | ||
TagField("$.type", as_name="type"), | ||
TextField("$.description", as_name="description"), | ||
VectorField( | ||
"$.description_embeddings", | ||
"FLAT", | ||
{ | ||
"TYPE": "FLOAT32", | ||
"DIM": VECTOR_DIMENSION, | ||
"DISTANCE_METRIC": "COSINE", | ||
}, | ||
as_name="vector", | ||
), | ||
) | ||
definition = IndexDefinition(prefix=["bikes:"], index_type=IndexType.JSON) | ||
res = client.ft("idx:bikes_vss").create_index( | ||
fields=schema, definition=definition | ||
) | ||
# >>> 'OK' | ||
# STEP_END | ||
# REMOVE_START | ||
assert res == "OK" | ||
time.sleep(2) | ||
# REMOVE_END | ||
|
||
# STEP_START validate_index | ||
info = client.ft("idx:bikes_vss").info() | ||
num_docs = info["num_docs"] | ||
indexing_failures = info["hash_indexing_failures"] | ||
# print(f"{num_docs} documents indexed with {indexing_failures} failures") | ||
# >>> 11 documents indexed with 0 failures | ||
# STEP_END | ||
# REMOVE_START | ||
assert (num_docs == "11") and (indexing_failures == "0") | ||
# REMOVE_END | ||
|
||
# STEP_START simple_query_1 | ||
query = Query("@brand:Peaknetic") | ||
res = client.ft("idx:bikes_vss").search(query).docs | ||
# print(res) | ||
# >>> [Document {'id': 'bikes:008', 'payload': None, 'brand': 'Peaknetic', 'model': 'Soothe Electric bike', 'price': '1950', 'description_embeddings': ... | ||
# STEP_END | ||
# REMOVE_START | ||
|
||
assert all( | ||
item in [x.__dict__["id"] for x in res] | ||
for item in ["bikes:008", "bikes:009"] | ||
) | ||
# REMOVE_END | ||
|
||
# STEP_START simple_query_2 | ||
query = Query("@brand:Peaknetic").return_fields("id", "brand", "model", "price") | ||
res = client.ft("idx:bikes_vss").search(query).docs | ||
# print(res) | ||
# >>> [Document {'id': 'bikes:008', 'payload': None, 'brand': 'Peaknetic', 'model': 'Soothe Electric bike', 'price': '1950'}, Document {'id': 'bikes:009', 'payload': None, 'brand': 'Peaknetic', 'model': 'Secto', 'price': '430'}] | ||
# STEP_END | ||
# REMOVE_START | ||
assert all( | ||
item in [x.__dict__["id"] for x in res] | ||
for item in ["bikes:008", "bikes:009"] | ||
) | ||
# REMOVE_END | ||
|
||
# STEP_START simple_query_3 | ||
query = Query("@brand:Peaknetic @price:[0 1000]").return_fields( | ||
"id", "brand", "model", "price" | ||
) | ||
res = client.ft("idx:bikes_vss").search(query).docs | ||
# print(res) | ||
# >>> [Document {'id': 'bikes:009', 'payload': None, 'brand': 'Peaknetic', 'model': 'Secto', 'price': '430'}] | ||
# STEP_END | ||
# REMOVE_START | ||
assert all(item in [x.__dict__["id"] for x in res] for item in ["bikes:009"]) | ||
# REMOVE_END | ||
|
||
# STEP_START def_bulk_queries | ||
queries = [ | ||
"Bike for small kids", | ||
"Best Mountain bikes for kids", | ||
"Cheap Mountain bike for kids", | ||
"Female specific mountain bike", | ||
"Road bike for beginners", | ||
"Commuter bike for people over 60", | ||
"Comfortable commuter bike", | ||
"Good bike for college students", | ||
"Mountain bike for beginners", | ||
"Vintage bike", | ||
"Comfortable city bike", | ||
] | ||
# STEP_END | ||
|
||
# STEP_START enc_bulk_queries | ||
encoded_queries = embedder.encode(queries) | ||
len(encoded_queries) | ||
# >>> 11 | ||
# STEP_END | ||
# REMOVE_START | ||
assert len(encoded_queries) == 11 | ||
# REMOVE_END | ||
|
||
|
||
# STEP_START define_bulk_query | ||
def create_query_table(query, queries, encoded_queries, extra_params={}): | ||
results_list = [] | ||
for i, encoded_query in enumerate(encoded_queries): | ||
result_docs = ( | ||
client.ft("idx:bikes_vss") | ||
.search( | ||
query, | ||
{ | ||
"query_vector": np.array( | ||
encoded_query, dtype=np.float32 | ||
).tobytes() | ||
} | ||
| extra_params, | ||
) | ||
.docs | ||
) | ||
for doc in result_docs: | ||
vector_score = round(1 - float(doc.vector_score), 2) | ||
results_list.append( | ||
{ | ||
"query": queries[i], | ||
"score": vector_score, | ||
"id": doc.id, | ||
"brand": doc.brand, | ||
"model": doc.model, | ||
"description": doc.description, | ||
} | ||
) | ||
|
||
# Optional: convert the table to Markdown using Pandas | ||
queries_table = pd.DataFrame(results_list) | ||
queries_table.sort_values( | ||
by=["query", "score"], ascending=[True, False], inplace=True | ||
) | ||
queries_table["query"] = queries_table.groupby("query")["query"].transform( | ||
lambda x: [x.iloc[0]] + [""] * (len(x) - 1) | ||
) | ||
queries_table["description"] = queries_table["description"].apply( | ||
lambda x: (x[:497] + "...") if len(x) > 500 else x | ||
) | ||
queries_table.to_markdown(index=False) | ||
|
||
|
||
# STEP_END | ||
|
||
# STEP_START run_knn_query | ||
query = ( | ||
Query("(*)=>[KNN 3 @vector $query_vector AS vector_score]") | ||
.sort_by("vector_score") | ||
.return_fields("vector_score", "id", "brand", "model", "description") | ||
.dialect(2) | ||
) | ||
|
||
create_query_table(query, queries, encoded_queries) | ||
# >>> | Best Mountain bikes for kids | 0.54 | bikes:003... (+ 32 more results) | ||
# STEP_END | ||
|
||
# STEP_START run_hybrid_query | ||
hybrid_query = ( | ||
Query("(@brand:Peaknetic)=>[KNN 3 @vector $query_vector AS vector_score]") | ||
.sort_by("vector_score") | ||
.return_fields("vector_score", "id", "brand", "model", "description") | ||
.dialect(2) | ||
) | ||
create_query_table(hybrid_query, queries, encoded_queries) | ||
# >>> | Best Mountain bikes for kids | 0.3 | bikes:008... (+22 more results) | ||
# STEP_END | ||
|
||
# STEP_START run_range_query | ||
range_query = ( | ||
Query( | ||
"@vector:[VECTOR_RANGE $range $query_vector]=>{$YIELD_DISTANCE_AS: vector_score}" | ||
) | ||
.sort_by("vector_score") | ||
.return_fields("vector_score", "id", "brand", "model", "description") | ||
.paging(0, 4) | ||
.dialect(2) | ||
) | ||
create_query_table( | ||
range_query, queries[:1], encoded_queries[:1], {"range": 0.55} | ||
) | ||
# >>> | Bike for small kids | 0.52 | bikes:001 | Velorim |... (+1 more result) | ||
# STEP_END |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.