Data engineering · Life sciences & scale

Arkadii Mirzabekian

I build reliable ingestion, validation, and platform tooling so teams can trust their data end to end.

Bochum, Germany · Remote-friendly

01

Recent experience

Hands-on work across lab data platforms, large migrations, productized tooling, and team quality practices.

Data Engineer — TetraScience

Life sciences data platform · Remote

Apr 2024·present

  • Lab data pipeline (top-10 pharma). End-to-end ingestion, standardization, and labeling for scientific instruments with delivery into enterprise ELN/LIMS — roughly $400k/year operating savings and better access for scientists.
  • Data migration & integrity (global pharma). Multi-step framework to migrate ~10 TB between S3 environments with custom checksums and reconciliation; zero data-loss migrations.
  • Secure data app modernization (biopharma). FastAPI app upgraded with SSO (OAuth2/OIDC), CSRF/CORS hardening, optimized Elasticsearch queries, and local caching for reliability and latency.
  • QC automation at scale. 18 production-grade data apps/pipelines replacing manual work; introduced Git-based versioning and lightweight CI in a change-resistant environment.
  • Instrument onboarding (R&D). Full onboarding: extraction, transformation, and storage pipelines for downstream analytics and reporting.
  • Schema evolution. Extended models and pipelines for new instrument/assay outputs so new experimental data could be analyzed.
  • Validation DSL. Domain-specific language and Python library to compile rules and validate JSON for automated data quality.
  • Terraform provider. Custom Terraform provider (Go) to define workflows in YAML and promote across environments for one-click use-case deployments.
  • Engineering quality. CI/CD quality gates and code reviews to improve reliability and maintainability.

Data Engineer — Playrix

Marketing data, fast-paced team · Remote

Jul 2022·Jun 2023

  • AWS & Python ETLs. Pipelines from Redshift, S3, APIs; Fargate and Redshift in the stack.
  • Analytics aggregation. Transforms for BI dashboards with monitoring alerts and fast error handling.

Data Engineer / Scientist — NRA/NSK R&D

Ad tech, large corporate R&D · Hybrid

2019·Jun 2022

  • Behavior-based predictions. Led Airflow pipelines across sources; matched session data with timed content.
  • Video content evaluation. PyTorch deep learning for video vectorization and a complex ETL for behavior insights.
  • Ad-hoc analytics. Timely analytics and troubleshooting for internal and external clients.
02

Key tools

Stack I use most often for pipelines, platforms, and data stores.

Languages

  • Python
  • Rust
  • Golang
  • SQL

Cloud

  • AWS
  • GCP

ETL / ELT

  • Airflow
  • Luigi
  • dbt

Databases

  • Redshift
  • PostgreSQL
  • MySQL
  • MongoDB
  • ClickHouse

Delivery

  • Git
03

Academics

Teaching alongside industry work.

Lecturer — Big Data Analysis and Storage Tools

Master's level · On-site · 2019, 2020

  • Hands-on perspective on big data ecosystems (Spark, HDFS, Kafka); course noted for practical novelty.
04

Education & certificates

Formal training and credentials.

Moscow College of Informatics and Computer Engineering

Moscow, Russia · Secondary specialized education, computer systems programming · 2012 – 2016

Netology

Moscow, Russia · Advanced training: product management · 2021 – 2022

Certificates

  • IELTS English C1 2023
  • AWS Solution Architect Nov 2023 – Nov 2026