Back to Experience

Datature · Sole Author

Model Training Service

Jul 2022 — Present

Sole author of the Model Training Service, deployed on Google Compute Engine. Built a framework-agnostic pipeline that enables customers to train across JAX, TensorFlow, and PyTorch within a single service — covering 16+ model architectures across every major computer vision task type.

Highlights

  • Framework-agnostic pipeline supporting JAX, TensorFlow, and PyTorch — customers train on any major framework without migration overhead
  • Multi-GPU support: Data Parallelism for all frameworks, Model Parallelism for JAX
  • Onboarded 16+ architectures: YOLOv8/11/26, MaskRCNN, EfficientDet, DeepLabV3, PaliGemma, DFINE, RF-DETR, EfficientAD, and more
  • Sliding window (SAHI) training across all task types — improves performance on high-resolution imagery with configurable overlap and NMS
  • Class auto-balancing with distribution control and max sample caps, reducing manual data preparation
  • Multilabel classification with Brier score and co-occurrence confusion matrices
  • Continual learning — users resume training from prior checkpoints without retraining from scratch
  • Multi-stage OpenAI-powered hyperparameter recommender — lowers barrier for non-ML users
  • Anomaly detection via Anomalib integration (EfficientAD) with full training and export pipeline

Tech Stack

PyTorch TensorFlow JAX Ultralytics Anomalib MONAI GCE Docker GitHub Actions Python

Related Articles