2020-11-28 21:45:55 +08:00
2021-02-16 04:04:05 +08:00
# Neural search demo
## With Qdrant + BERT + FastAPI
2020-11-28 21:45:55 +08:00
2021-02-16 04:04:05 +08:00
This repository contains a code for Neural Search for startups [demo ](https://demo.qdrant.tech ).
The demo is based on the vector search engine [Qdrant ](https://github.com/qdrant/qdrant ).
## Requirements
Install python requirements:
```
pip install poetry
poetry install
```
You will also need [Docker ](https://docs.docker.com/get-docker/ ) and [docker-compose ](https://docs.docker.com/compose/install/ )
2023-11-23 08:40:44 +08:00
## Quick Start <a href="https://replit.com/new/github/qdrant/qdrant_demo"><img align="right" src="https://replit.com/badge/github/qdrant/qdrant_demo" alt="Run on Repl.it"></a>
2021-02-16 04:04:05 +08:00
2024-01-06 20:03:08 +08:00
To launch this demo locally you will need to download data first.
2021-02-16 04:04:05 +08:00
The source of the original data is [https://www.startups-list.com/ ](https://www.startups-list.com/ )
2024-01-06 20:03:08 +08:00
You can download the data via the following command:
2021-02-16 04:04:05 +08:00
2024-01-06 20:03:08 +08:00
```bash
2024-01-06 21:37:05 +08:00
wget https://storage.googleapis.com/generall-shared-data/startups_demo.json -P data/
2024-01-06 20:03:08 +08:00
```
2021-06-15 18:19:33 +08:00
2021-02-16 04:04:05 +08:00
To launch service locally, use
```
docker-compose -f docker-compose-local.yaml up
```
After service is started you can upload initial data to the search engine.
```
# Init neural index
2023-09-25 05:53:40 +08:00
python -m qdrant_demo.init_collection_startups
2021-02-16 04:04:05 +08:00
```
After a successful upload, neural search API will be available at [http://localhost:8000/docs ](http://localhost:8000/docs )
2023-09-25 05:53:40 +08:00
2024-01-06 20:03:08 +08:00
You can play with the data in the following [Colab Notebook ](https://colab.research.google.com/drive/1kPktoudAP8Tu8n8l-iVMOQhVmHkWV_L9?usp=sharing ).
[![Open In Colab ](https://colab.research.google.com/assets/colab-badge.svg )](https://colab.research.google.com/drive/1kPktoudAP8Tu8n8l-iVMOQhVmHkWV_L9?usp=sharing)
2023-09-25 05:53:40 +08:00
## Start with Crunchbase data
Alternatively, you can use larger dataset of companies provided by [Crunchbase ](https://www.crunchbase.com/ ).
You will need to register at [https://www.crunchbase.com/ ](https://www.crunchbase.com/ ) and get an API key.
```bash
# Download data
wget 'https://api.crunchbase.com/odm/v4/odm.tar.gz?user_key=< CRUNCHBASE-API-KEY > ' -O odm.tar.gz
```
Decompress data and put `organizations.csv` into `./data` folder.
```bash
# Decompress data
tar -xvf odm.tar.gz
mv odm/organizations.csv ./data
```
After that, you can run indexing of Crunchbase data into Qdrant.
```bash
# Init neural index
python -m qdrant_demo.init_collection_crunchbase
```