Containerized Voice Identification with Resemblyzer & QdrantDB
Written on April 2nd , 2025 by Cody SniderIntroduction
Voice recognition is evolving fast. In this guide, we build a containerized speaker identification system using Resemblyzer for voice embeddings, QdrantDB for vector storage, and a FastAPI interface for ease of access. If you’re tired of dependency hell and want a scalable solution that runs anywhere, read on.
System Overview
The system consists of two primary endpoints:
- /upload_identity: Upload known audio clips (MP3/WAV) to register identities.
- /identify: Submit an audio sample to identify the speaker against the stored identities.
Persistent data is maintained under /data/identities
within the container.
Building the Container
The Dockerfile below sets up the environment with necessary dependencies like ffmpeg
for audio processing and build-essential
for compiling C extensions.
FROM python:3.9-slim
RUN apt-get update && apt-get install -y ffmpeg build-essential && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY main.py .
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
The accompanying docker-compose.yml file integrates QdrantDB:
version: '3.8'
services:
api:
build: .
ports:
- "8000:8000"
volumes:
- data:/data
depends_on:
- qdrant
qdrant:
image: qdrant/qdrant:latest
environment:
- QDRANT__STORAGE__DATA_PATH=/data/qdrant
volumes:
- data:/data
ports:
- "6333:6333"
volumes:
data:
FastAPI Interface and Endpoint Details
The API leverages Resemblyzer to preprocess audio files and generate 256-dimensional embeddings. These embeddings are stored in QdrantDB, enabling rapid similarity searches.
Below is the /identify endpoint with execution time logging:
@app.post("/identify")
async def identify(file: UploadFile = File(...), top_k: int = 3):
start_time = time.time()
if not file.filename.lower().endswith((".wav", ".mp3")):
raise HTTPException(status_code=400, detail="Unsupported file type.")
# Save temporarily
temp_path = f"/tmp/{int(time.time())}_{file.filename}"
with open(temp_path, "wb") as f:
shutil.copyfileobj(file.file, f)
try:
wav = preprocess_wav(temp_path)
except Exception as e:
os.remove(temp_path)
raise HTTPException(status_code=400, detail=f"Error processing audio: {e}")
os.remove(temp_path)
embedding = encoder.embed_utterance(wav)
search_results = qdrant.search(
collection_name=COLLECTION_NAME,
query_vector=embedding.tolist(),
limit=top_k
)
if not search_results:
exec_time = time.time() - start_time
print(f"Identity check execution time: {exec_time:.3f} seconds")
return JSONResponse(content={"message": "No identities found.", "execution_time": exec_time})
matches = {}
for point in search_results:
name = point.payload.get("name", "unknown")
matches.setdefault(name, []).append(point.score)
avg_matches = {name: sum(scores)/len(scores) for name, scores in matches.items()}
best_match = max(avg_matches.items(), key=lambda x: x[1])
exec_time = time.time() - start_time
print(f"Identity check execution time: {exec_time:.3f} seconds")
return JSONResponse(content={
"best_match": best_match[0],
"score": best_match[1],
"raw_results": [
{"id": p.id, "name": p.payload.get("name", "unknown"), "score": p.score} for p in search_results
],
"execution_time": exec_time
})
Testing the API
You can test the endpoints using cURL commands:
# Upload a known identity
curl -X POST "http://0.0.0.0:8000/upload_identity" \
-F "identity_name=Ronaldo" \
-F "file=@/path/to/Ronaldo.mp3"
# Identify a speaker from a test audio clip
curl -X POST "http://0.0.0.0:8000/identify" \
-F "file=@/path/to/test_audio.mp3" \
-F "top_k=3"
Conclusion
This containerized voice identification system combines Resemblyzer’s efficient audio embeddings with QdrantDB’s fast vector search, all wrapped in a FastAPI interface for simplicity and scalability. By containerizing the solution, you avoid dependency conflicts and ensure that your environment is consistent across deployments. If you need a robust and repeatable voice biometric solution, this setup delivers.
GitHub: codysnider/resemblyzer-qdrantdb