Back to Experience

Datature · Sole Author

VLM Fine-Tuning Pipeline

Jul 2025 — Present

Built the VLM fine-tuning pipeline from scratch as a separate service on GCP, opening up an entirely new product line. Supports multiple frontier vision-language models with efficient fine-tuning techniques and diverse training modalities including VQA, Chain-of-Thought reasoning, and video.

Highlights

  • Established the full service from scratch: CI/CD, GCP deployment, training initialiser, run manager integration
  • Supports QWEN2.5-VL, QWEN3-VL, NVILA, CosmosReason1/2, and KimiVL
  • LoRA fine-tuning with quantization config — reduces GPU memory requirements for large model training
  • Tensor parallelism for multi-GPU training of large models; OOM-resilient training loops
  • VQA training with schema design, data collator, and annotation kind handling
  • Chain-of-Thought (CoT) reasoning with structured evaluation returning phrase grounding indices
  • Freeform open-ended generation training for CosmosReason2 and QWEN3-VL
  • Video training via PyAV ingestion with temporal expansion and evaluation preview
  • Intelliscribe caption generation microservice with JSON schema validation for structured outputs

Tech Stack

PyTorch LoRA Quantization Tensor Parallelism DeepSpeed PyAV GCP Docker Python

Related Articles