Role overview
Participate in the design of software that supports and enriches research productivity and reliability; implement software solutions. Develop software and data services with researchers to ensure that modern standards of reproducible code are kept.
What you'll work on
Lead analytic development across several ongoing clinical research initiatives and enrich research productivity and reliability; implement software solutions. Ensure that modern standards of reproducible code are kept.
A research lab studying suicide in the Department of Psychology at Harvard University is seeking to hire a Full-Stack Machine Learning Engineer (MLE) / Data Scientist (DS) to support the end-to-end management, analysis, and visualization of behavioral and clinical data streams. The full-stack MLE/DS will work on studies aimed at advancing the understanding, prediction, and treatment of suicidal thoughts and behaviors. The position involves working on scalable data pipelines, integrating multimodal data (e.g., data from smartphone-based surveys, passive smartphone/wearable monitors, social media platforms, electronic health records), and helping to deploy analytic tools that can generate actionable insights (e.g., visualizations, algorithms) in real-time.
The MLE will join a dynamic, multi-site team working at the intersection of machine learning, digital phenotyping, pediatric mental health, and real-time clinical decision support on projects aimed at improving identification of, and intervention on, mental health problems (e.g., suicide) using rich data sources. The successful applicant will have strong programming skills and technical expertise in ML to execute tasks independently, advanced data management, analysis, and visualization skills. This role is ideal for someone who wants to work on mental health research with real-world implications. Responsibilities include:
- Work with the research team to support the design, development, and implementation of ML models.
- Support infrastructure for cleaning, processing, analyzing, and visualization of various data types (e.g., GPS data scraped from smartphones, accelerometer data from wearable devices, digital phenotyping data, etc.).
- Support experiments to evaluate model performance, perform error analysis, and suggest and implement improvements.
- Conduct higher-level analysis of data and supervise analyses performed by other members of the lab.
- Integrate data across workflows (e.g., digital phenotyping, behavioral, and clinical data).
- Help to develop and support a secure, scalable dashboard or lightweight clinical app that synthesizes data and provides visualizations in real time.
- Deploy modular, reusable visualization components and maintain version-controlled code repositories.
- Work closely with university and Harvard teaching hospital-based IT teams to ensure interoperability, reliability, and clinical relevance.
- Assist with preparation of grant applications, presentations, and publications.
What we're looking for
- 3-5+ years of hands-on experience with time-series data, sensor data, or biomedical/wearable data.
- Proficiency in one or more programming languages (Python and/or JavaScript preferred), including libraries for ML (TensorFlow, PyTorch), data engineering (pandas, NumPy), and visualization (Plotly, Dash, Bokeh).
- Experience deploying dashboards or apps (e.g., Dash, Streamlit, React, Flask, or similar).
- Experience with real-time or streaming data pipelines.
- Expert-level knowledge of statistical programming, particularly R (tidyverse, ggplot2) and R Markdown.
- Strong understanding of ML approaches for classification, anomaly detection, and prediction using high-frequency data.
- Experience with multilevel longitudinal data, missing data strategies, and clinical outcome modeling.
- Experience with EHR data, REDCap, Qualtrics, or hospital-based informatics systems.