Datature · Export: Sole Author · Inference: Co-Author
Model Export & Inference
Jul 2022 — Present
Authored the Model Export Service (Cloud Run) and co-authored the Model Hosting & Inference Service (GKE). Together these enable customers to deploy trained models to edge devices, mobile platforms, and high-throughput production environments.
Highlights
▸Cross-framework export to TFLite, CoreML, TensorRT, OpenVINO, ONNX, TensorFlow, and PyTorch
▸Float16/Int8 quantization — up to 75% model size reduction for edge and mobile deployment
▸Model pruning up to 90% of parameters for storage-constrained and real-time applications
▸Export wrapped in multiprocessing with async apply and timeout — prevents Cloud Run request hangs under load
▸Triton Inference Server integration with Numba JIT optimisation for high-throughput production inference
▸Supports all onboarded architectures across image, video, and 3D volumetric inputs (NIfTI, DICOM)
▸Active learning pipeline with entropy-based metrics and bitmask annotation re-upload
Tech Stack
TensorRT TFLite CoreML OpenVINO ONNX Triton Numba Cloud Run GKE Docker CUDA