docs: add config in readme
This commit is contained in:
parent
5c9f8ae42e
commit
34c762f7e7
|
@ -20,7 +20,7 @@ jobs:
|
|||
core.setFailed('token are not equivalent!')
|
||||
if: github.event.inputs.release_token != env.release_token
|
||||
env:
|
||||
release_token: ${{ secrets.LCSERVE_RELEASE_TOKEN }}
|
||||
release_token: ${{ secrets.VECTORDB_RELEASE_TOKEN }}
|
||||
|
||||
update-docker:
|
||||
needs: token-check
|
||||
|
@ -32,7 +32,7 @@ jobs:
|
|||
token: ${{ secrets.JINA_DEV_BOT }}
|
||||
inputs: '{ "release_token": "${{ env.release_token }}", "triggered_by": "TAG"}'
|
||||
env:
|
||||
release_token: ${{ secrets.LCSERVE_RELEASE_TOKEN }}
|
||||
release_token: ${{ secrets.VECTORDB_RELEASE_TOKEN }}
|
||||
|
||||
regular-release:
|
||||
needs: token-check
|
||||
|
|
41
README.md
41
README.md
|
@ -34,8 +34,8 @@ use and develop vector databases.
|
|||
- Serverless capacity: `vectordb` can be deployed in the cloud in serverless mode, allowing you to save resources and have the data available only when needed.
|
||||
|
||||
- Multiple ANN algorithms: `vectordb` contains different implementations of ANN algorithms. These are the ones offered so far, we plan to integrate more:
|
||||
- Exact NN Search: Implements Simple Nearest Neighbour Algorithm.
|
||||
- HNSWLib: Based on [HNSWLib](https://github.com/nmslib/hnswlib)
|
||||
- InMemoryExactNNVectorDB (Exact NN Search): Implements Simple Nearest Neighbour Algorithm.
|
||||
- HNSWVectorDB (based on HNSW): Based on [HNSWLib](https://github.com/nmslib/hnswlib)
|
||||
|
||||
<!--(THIS CAN BE SHOWN WHEN FILTER IS ENABLED)- Filter capacity: `vectordb` allows you to have filters on top of the ANN search. -->
|
||||
|
||||
|
@ -43,7 +43,7 @@ use and develop vector databases.
|
|||
|
||||
## 🏁 Getting Started
|
||||
|
||||
To get started with Vector Database, simply follow these easy steps, in this example we are going to use `HNSWVecDB` as example:
|
||||
To get started with Vector Database, simply follow these easy steps, in this example we are going to use `InMemoryExactNNVectorDB` as example:
|
||||
|
||||
1. Install `vectordb`:
|
||||
|
||||
|
@ -62,10 +62,10 @@ class MyTextDoc(TextDoc):
|
|||
|
||||
Make sure that the schema has a field `schema` as a `tensor` type with shape annotation as in the example.
|
||||
|
||||
3. Use any of the pre-built databases with the document schema (InMemoryExactNNVectorDB or HNSWLibDB):
|
||||
3. Use any of the pre-built databases with the document schema (InMemoryExactNNVectorDB or HNSWVectorDB):
|
||||
|
||||
```python
|
||||
from vectordb import InMemoryExactNNVectorDB, HNSWLibDB
|
||||
from vectordb import InMemoryExactNNVectorDB, HNSWVectorDB
|
||||
db = InMemoryExactNNVectorDB[MyTextDoc](workspace='./workspace_path')
|
||||
|
||||
db.index(inputs=DocList[MyTextDoc]([MyTextDoc(text=f'index {i}', embedding=np.random.rand(128)) for i in range(1000)]))
|
||||
|
@ -210,6 +210,37 @@ You can then list and delete your deployed DBs with `jc` command:
|
|||
|
||||
## ⚙️ Configure
|
||||
|
||||
Here you can find the list of parameters you can use to configure the behavior for each of the `VectorDB` types.
|
||||
|
||||
### InMemoryExactNNVectorDB
|
||||
|
||||
This database type does an exhaustive search on the embeddings and therefore has a very limited configuration setting:
|
||||
|
||||
- workspace: The folder where the required data will be persisted.
|
||||
|
||||
```python
|
||||
InMemoryExactNNVectorDB[MyDoc](workspace='./vectordb')
|
||||
InMemoryExactNNVectorDB[MyDoc].serve(workspace='./vectordb')
|
||||
```
|
||||
|
||||
### HNSWVectorDB
|
||||
|
||||
This database implements Approximate Nearest Neighbour based on HNSW algorithm using [HNSWLib](https://github.com/nmslib/hnswlib).
|
||||
|
||||
It containes more configuration options:
|
||||
|
||||
- workspace: The folder where the required data will be persisted.
|
||||
|
||||
Then a set of configurations that tweak the performance and accuracy of the NN search algorithm. You can find more details in [HNSWLib README](https://github.com/nmslib/hnswlib)
|
||||
|
||||
- space: name of the space, related to the similarity metric used (can be one of "l2", "ip", or "cosine"), default: "l2"
|
||||
- max_elements: Initial capacity of the index, which is increased dynamically, default: 1024,
|
||||
- ef_construction: parameter that controls speed/accuracy trade-off during the index construction, default: 200,
|
||||
- ef: parameter controlling query time/accuracy trade-off, default: 10,
|
||||
- M: parameter that defines the maximum number of outgoing connections in the graph, default: 16.
|
||||
- allow_replace_deleted: enables replacing of deleted elements with new added ones, default: False
|
||||
- num_threads: default number of threads to use while `index` and `search` are used, default: 1
|
||||
|
||||
## 🛣️ Roadmap
|
||||
|
||||
We have big plans for the future of Vector Database! Here are some of the features we have in the works:
|
||||
|
|
|
@ -84,7 +84,7 @@ class InMemoryExactNNIndexer(TypedExecutor):
|
|||
|
||||
def close(self):
|
||||
if self._index_file_path is not None:
|
||||
self._indexer.persist(self._index_file_path)
|
||||
self._indexer.persist()
|
||||
|
||||
|
||||
|
||||
|
|
Loading…
Reference in New Issue