Back to Experience

Datature · Export: Sole Author · Inference: Co-Author

Model Export & Inference

Jul 2022 — Present

Authored the Model Export Service (Cloud Run) and co-authored the Model Hosting & Inference Service (GKE). Together these enable customers to deploy trained models to edge devices, mobile platforms, and high-throughput production environments.

Highlights

  • Cross-framework export to TFLite, CoreML, TensorRT, OpenVINO, ONNX, TensorFlow, and PyTorch
  • Float16/Int8 quantization — up to 75% model size reduction for edge and mobile deployment
  • Model pruning up to 90% of parameters for storage-constrained and real-time applications
  • Export wrapped in multiprocessing with async apply and timeout — prevents Cloud Run request hangs under load
  • Triton Inference Server integration with Numba JIT optimisation for high-throughput production inference
  • Supports all onboarded architectures across image, video, and 3D volumetric inputs (NIfTI, DICOM)
  • Active learning pipeline with entropy-based metrics and bitmask annotation re-upload

Tech Stack

TensorRT TFLite CoreML OpenVINO ONNX Triton Numba Cloud Run GKE Docker CUDA

Related Articles