Arkadii Mirzabekian | Data Engineer

Recent experience

Hands-on work across lab data platforms, large migrations, productized tooling, and team quality practices.

Life sciences data platform · Remote

Apr 2024·present

Lab data pipeline (top-10 pharma). End-to-end ingestion, standardization, and labeling for scientific instruments with delivery into enterprise ELN/LIMS — roughly $400k/year operating savings and better access for scientists.
Data migration & integrity (global pharma). Multi-step framework to migrate ~10 TB between S3 environments with custom checksums and reconciliation; zero data-loss migrations.
Secure data app modernization (biopharma). FastAPI app upgraded with SSO (OAuth2/OIDC), CSRF/CORS hardening, optimized Elasticsearch queries, and local caching for reliability and latency.
QC automation at scale. 18 production-grade data apps/pipelines replacing manual work; introduced Git-based versioning and lightweight CI in a change-resistant environment.
Instrument onboarding (R&D). Full onboarding: extraction, transformation, and storage pipelines for downstream analytics and reporting.
Schema evolution. Extended models and pipelines for new instrument/assay outputs so new experimental data could be analyzed.
Validation DSL. Domain-specific language and Python library to compile rules and validate JSON for automated data quality.
Terraform provider. Custom Terraform provider (Go) to define workflows in YAML and promote across environments for one-click use-case deployments.
Engineering quality. CI/CD quality gates and code reviews to improve reliability and maintainability.

Marketing data, fast-paced team · Remote

Jul 2022·Jun 2023

AWS & Python ETLs. Pipelines from Redshift, S3, APIs; Fargate and Redshift in the stack.
Analytics aggregation. Transforms for BI dashboards with monitoring alerts and fast error handling.

Ad tech, large corporate R&D · Hybrid

2019·Jun 2022

Behavior-based predictions. Led Airflow pipelines across sources; matched session data with timed content.
Video content evaluation. PyTorch deep learning for video vectorization and a complex ETL for behavior insights.
Ad-hoc analytics. Timely analytics and troubleshooting for internal and external clients.

Stack I use most often for pipelines, platforms, and data stores.

Teaching alongside industry work.

Master's level · On-site · 2019, 2020

Hands-on perspective on big data ecosystems (Spark, HDFS, Kafka); course noted for practical novelty.

Formal training and credentials.

Moscow, Russia · Secondary specialized education, computer systems programming · 2012 – 2016

Moscow, Russia · Advanced training: product management · 2021 – 2022