Role overview
You will design, optimize, and productionize machine learning systems using CRC's full internal data environment. This includes tuning an existing NLP engine, developing statistical/ML models (including ordered probit and regression-based DFS calculator), building internal LLMs trained on CRC documents, and developing the AI analytics chatbot used by all business units.
This role requires someone who ships ML systems—not someone who just builds notebooks.
Responsibilities
- Build, tune, and validate statistical models including multi-stage regression, ordered probit, and generalized linear models, audit automation, acuity scoring, and financial forecasting Engineer features from structured and unstructured healthcare data (EMR, claims, revenue cycle, clinician notes)
- Tune the existing CRC NLP engine for clinical note understanding, keyword extraction, concept expansion, negation detection, and sentiment scoring
- Build custom clinical embeddings using HuggingFace Transformers, spaCy, and domain-tuned vector models
- Develop and maintain a CRC private LLM, trained on internal knowledge bases, documentation, analytics logic, and care guidelines Build automated pipelines for LLM evaluation, retraining, retrieval-augmented generation (RAG), and grounded QA
- Architect, build, and deploy the AI Analytics Chatbot, integrating model logic, business rules, and Fabric/Databricks data sources
- Integrate ML models into production services using notebooks, APIs, or batch inference jobs Support creation of AI-generated reporting, insights summaries, and automated clinical/financial narratives
- Build maintainable ML pipelines (training, validation, deployment) using Databricks, Fabric, MLflow, GitHub, and CI/CD
- Implement model monitoring, drift detection, and automated retraining Package and deploy reproducible models via APIs or scheduled Fabric/Databricks workflows
- Work with data engineering to embed models into CRC applications
- Partner with BI analysts to transform model outputs into dashboards
- Document methodologies, assumptions, architecture, and validation processes clearly
- 3–6 years of hands-on machine learning engineering experience (not just DS notebooks)
- Strong Python engineering background: pandas, scikit-learn, statsmodels, PyTorch or TensorFlow, transformers, spaCy
- Experience building and tuning LLM and NLP pipelines end-to-end
- Experience with regression, ordered probit/logit, hierarchical models, and general statistical modeling
- Experience deploying ML workloads in Databricks, Azure ML, and Fabric
- Strong SQL for feature engineering and model validation
- Prior experience working with healthcare data (EMR, claims, RCM, CMS) preferred
- Strong communication and the ability to explain complex ML systems to non-technical stakeholders.
- Proactive, self-managing engineer who can independently own ML systems end-to-end.
- Fluent English required
Preferred qualifications
- Experience with: Retrieval-Augmented Generation (RAG) pipelines Vector databases (FAISS, Chroma, Pinecone, Qdrant) Enterprise chatbot frameworks MLflow, CI/CD, GitHub Actions, and model versioning Power BI integration for ML outputs FHIR/SMART on FHIR
- Retrieval-Augmented Generation (RAG) pipelines
- Vector databases (FAISS, Chroma, Pinecone, Qdrant)
- Enterprise chatbot frameworks
- MLflow, CI/CD, GitHub Actions, and model versioning
- Power BI integration for ML outputs
- FHIR/SMART on FHIR
- Databricks ML Associate/Professional
- Azure AI Engineer Associate
- DeepLearning.AI NLP/LLM specializations