Data Engineer — TetraScience
Apr 2024·present
- Lab data pipeline (top-10 pharma). End-to-end ingestion, standardization, and labeling for scientific instruments with delivery into enterprise ELN/LIMS — roughly $400k/year operating savings and better access for scientists.
- Data migration & integrity (global pharma). Multi-step framework to migrate ~10 TB between S3 environments with custom checksums and reconciliation; zero data-loss migrations.
- Secure data app modernization (biopharma). FastAPI app upgraded with SSO (OAuth2/OIDC), CSRF/CORS hardening, optimized Elasticsearch queries, and local caching for reliability and latency.
- QC automation at scale. 18 production-grade data apps/pipelines replacing manual work; introduced Git-based versioning and lightweight CI in a change-resistant environment.
- Instrument onboarding (R&D). Full onboarding: extraction, transformation, and storage pipelines for downstream analytics and reporting.
- Schema evolution. Extended models and pipelines for new instrument/assay outputs so new experimental data could be analyzed.
- Validation DSL. Domain-specific language and Python library to compile rules and validate JSON for automated data quality.
- Terraform provider. Custom Terraform provider (Go) to define workflows in YAML and promote across environments for one-click use-case deployments.
- Engineering quality. CI/CD quality gates and code reviews to improve reliability and maintainability.