Role overview

Index Analytics, LLC, is a rapidly growing, Baltimore-based small business providing health-related consulting services to the federal government. At the center of our company culture is a commitment to instilling a dynamic and employee-friendly place to work. We place a priority on promoting a supportive and collegial team environment and enhancing staff experience through career development and educational opportunities.

Index Analytics is seeking a Data Scientist to support Government clients in the Baltimore and Washington D.C. Metro Area. This resource will create value from structured and unstructured data by applying domain knowledge, statistical analysis, and advanced machine learning techniques to solve complex healthcare challenges.

This role emphasizes end-to-end development of machine learning and AI systems, including traditional ML, deep learning, NLP, and modern LLM-based architectures such as Retrieval-Augmented Generation (RAG) and agentic AI systems.

What you'll work on

Design, develop, and maintain machine learning and deep learning models, including both traditional (e.g., regression, tree-based models) and neural network-based approaches.
Build and deploy end-to-end ML pipelines on AWS (e.g., SageMaker, S3, Glue) for scalable training, evaluation, and inference.
Develop and implement advanced NLP solutions, including text classification, entity recognition, topic modeling, and semantic search using models such as BERT and transformer-based architectures.
Design, build, and productionize RAG (Retrieval-Augmented Generation) systems, including document ingestion, embedding pipelines, vector search, and LLM orchestration.
Develop LLM-powered applications, including prompt engineering, evaluation frameworks, and optimization techniques for accuracy, consistency, and cost.
Contribute to agentic AI system design, including multi-step reasoning workflows, tool use, and orchestration of LLM-driven agents for complex tasks.
Implement predictive analytics and statistical modeling to uncover patterns, trends, and insights from healthcare data.
Perform data mining and exploratory data analysis (EDA) using state-of-the-art techniques across structured and unstructured datasets.
Build data visualizations, dashboards, and analytical tools to communicate findings clearly to technical and non-technical stakeholders.
Evaluate model performance using appropriate metrics (e.g., accuracy, AUC, precision/recall) and present results in a clear, actionable manner.
Collaborate in an Agile environment with cross-functional teams including engineers, analysts, and stakeholders.
Recommend data-driven solutions and AI strategies aligned with CMS business needs and healthcare policy objectives.
U.S. citizen or otherwise authorized to work in the United States and able to demonstrate physical residency in the U.S. for at least three (3) of the past five (5) years. Must be able to obtain a U.S. Federal government client badge and pass a Public Trust clearance.
Master’s degree in Computer Science, Data Science, or a related field required; PhD preferred.
Three (3) or more years of experience as a Data Scientist or in a similar role.
Strong experience in machine learning and statistical modeling, including supervised and unsupervised learning techniques, deep learning, and a solid foundation in probability, hypothesis testing, and regression.
Proven expertise in NLP and text analytics, including transformer-based architecture (e.g., BERT and related models), embeddings, vector databases, and semantic search systems.
Hands-on experience building LLM-powered applications, including prompt engineering, RAG architecture, and ideally agentic workflows or LLM orchestration frameworks, preferably within AWS environments (e.g., Bedrock).
Advanced programming skills in Python (preferred) and/or R, with practical experience using ML and data libraries such as pandas, NumPy, scikit-learn, PyTorch, and TensorFlow.
Strong experience with AWS cloud and MLOps tooling, including SageMaker, S3, Glue, Airflow, and data stores such as Redshift and DynamoDB, along with version control (GitHub) and CI/CD pipelines (e.g., Jenkins).
Experience with backend systems and data integration, including data modeling and supporting APIs for web-based and production applications.
Strong written and verbal communication skills, with the ability to explain complex models and insights clearly.

Tags & focus areas

Used for matching and alerts on DevFound

Remote Ai Machine Learning Deep Learning Data Science Generative Ai

Data Scientist

Role overview

What you'll work on

Tags & focus areas

Ready to Join the Team?