What You’ll Do
Integrate hosted LLM APIs (e.g. OpenAI, Anthropic) + custom models to support intelligent in-product behavior.
Build and fine-tune transformer models using PyTorch, HuggingFace
Design and deploy retrieval-augmented generation (RAG) pipelines with vector databases (e.g., pgvector) and graph-based reasoning (e.g., Neo4j).
Develop scalable inference systems using vLLM, speculative decoding, and optimized serving techniques.
Build modular, production-grade pipelines for training, evaluation, and deployment.
Collaborate closely with product, design, and full-stack teams to ship features that bring AI to end users.
Own infrastructure around Docker, Cloud Run, and GCP, ensuring speed, reliability, and observability.
What You Bring
Strong Python engineering background with clean, tested, and maintainable code.
Proven experience building with transformer-based models, including custom training and fine-tuning.
Deep familiarity with HuggingFace, PyTorch, tokenization, and evaluation frameworks.
Experience integrating and orchestrating LLM APIs (OpenAI, Anthropic) into user-facing products.
Understanding of semantic search, vector storage (FAISS, pgvector), and hybrid symbolic-neural approaches.
Experience designing or consuming graph-based knowledge systems (e.g., Neo4j, property graphs).
Ability to build and debug scalable training and inference systems.
Bonus Points For
Hands-on experience with Docker and production deployment on Google Cloud (GKE, Cloud Run).
Experience with RLHF, reward models, or reinforcement learning for LLM alignment.
Knowledge of document understanding, OCR, or structured PDF parsing.
Exposure to monitoring and observability tools (e.g., Prometheus, Grafana, OpenTelemetry).
Background in linguistics, semantics, or computational reasoning.