Senior Data Engineer II - Electronic Health Records (EHR)
Company: Formation Bio
Location: New York City
Posted on: January 27, 2026
|
|
|
Job Description:
About Formation Bio Formation Bio is a tech and AI driven pharma
company differentiated by radically more efficient drug
development. Advancements in AI and drug discovery are creating
more candidate drugs than the industry can progress because of the
high cost and time of clinical trials. Recognizing that this
development bottleneck may ultimately limit the number of new
medicines that can reach patients, Formation Bio, founded in 2016
as TrialSpark Inc., has built technology platforms, processes, and
capabilities to accelerate all aspects of drug development and
clinical trials. Formation Bio partners, acquires, or in-licenses
drugs from pharma companies, research organizations, and biotechs
to develop programs past clinical proof of concept and beyond,
ultimately helping to bring new medicines to patients. The company
is backed by investors across pharma and tech, including a16z,
Sequoia, Sanofi, Thrive Capital, Sam Altman, John Doerr, Spark
Capital, SV Angel Growth, and others. You can read more at the
following links: Our Vision for AI in Pharma Our Current Drug
Portfolio Our Technology & Platform At Formation Bio, our values
are the driving force behind our mission to revolutionize the
pharma industry. Every team and individual at the company shares
these same values, and every team and individual plays a key part
in our mission to bring new treatments to patients faster and more
efficiently. About the Position We’re looking for a Senior Data
Engineer to join the Data Platform team at Formation Bio to help
transform Electronic Health Records (EHR) data into structured,
analytics-ready assets. In this role, you’ll be partnering closely
with our Data Science team to model, transform, and refine data for
operational and scientific use cases. This position sits at the
intersection of healthcare data engineering, modern data platform
infrastructure, and generative AI. While your initial focus will be
on building high-quality EHR models for Formation Bio platform,
you’ll also contribute to our broader data architecture by
leveraging tools like Snowflake, Dagster, and dbt to enable
scalable, governed, and high-reliability pipelines. The ideal
candidate combines deep data engineering experience with both GenAI
fluency (e.g., LLM-based entity extraction, summarization,
classification) and strong technical expertise with modern data
tooling. You’ll play a key role in shaping how healthcare data
becomes discoverable, structured, and impactful across the
organization. Responsibilities Model and transform raw EHR data
into clean, canonical, and analytics-ready datasets using SQL,
Python, and clinical standards like FHIR, HL7, or OMOP. Build and
manage scalable data pipelines using Dagster for orchestration, dbt
for transformation, and Snowflake as the primary compute and
storage engine. Collaborate with Data Science and product
stakeholders to co-develop cohort logic, derived features, and
structured outputs that meet real-world scientific needs. Apply
Generative AI techniques within transformation layers—using LLMs
for named entity recognition, document summarization,
classification, and schema alignment. Write robust, testable, and
version-controlled code that adheres to CI/CD and data governance
best practices. Implement data validation and observability
frameworks to ensure quality, trust, and reproducibility of
datasets. Document transformation logic, assumptions, and data
lineage in collaboration with metadata and cataloging systems.
Contribute to the evolution of the Data Platform by helping define
standards, patterns, and best practices around GenAI and
platform-scale data engineering. About You You have 5 years of
experience in data engineering, ideally with at least 2 years
working in healthcare or life sciences, including direct exposure
to EHR datasets. Experience with ontologies and biomedical schemas
(e.g. UMLS, LOINC, ICD9/10, MeSH, etc.)] Experience and
understanding of modalities found within EHR datasets incl. Billing
claims, lab results, visit notes, images Experience in biomedical
feature engineering, e.g. variable transformations and derivatives
You’re fluent in SQL and Python, and you’ve built and maintained
production-grade pipelines that support analytics, science, or
operational workflows. You have hands-on expertise with modern data
infrastructure, including: You’re experienced in applying GenAI
techniques within pipelines, including prompt engineering,
LLM-based entity extraction, and classification/summarization
workflows. You value clarity, documentation, and structured
thinking—especially when working with complex data like healthcare
records. You have a growth mindset and are excited to build bridges
between isolated data environments and governed, shared models that
power scientific innovation. Bonus: You’ve worked in regulated or
privacy-sensitive data environments, and you’re familiar with
governance models for PHI or sensitive data Formation Bio is
prioritizing hiring in key hubs, primarily the New York City and
Boston metro areas, with additional growth in the Research Triangle
(NC) and San Francisco Bay Area. Please only apply if you reside in
these locations or are willing to relocate. Compensation: The
target salary range for this role is: $230,000 - $280,000. Salary
ranges are informed by a number of factors including geographic
location. The range provided includes base salary only. In addition
to base salary, we offer equity, comprehensive benefits, generous
perks, hybrid flexibility, and more. If this range doesn't match
your expectations, please still apply because we may have something
else for you. You will receive consideration for employment without
regard to race, color, religion, gender, gender identity or
expression, sexual orientation, national origin, genetics,
disability, age, or veteran status. LI-hybrid
Keywords: Formation Bio, Scranton , Senior Data Engineer II - Electronic Health Records (EHR), Engineering , New York City, Pennsylvania