Data Scientist

M Science · · $90k - $175k

Actively hiring Posted about 2 months ago 3 min read

Role overview

We are seeking a highly skilled Data Scientist to design and develop pipelines and AI/ML models and workflows on our 50+ alternative data panels. The ideal candidate will have deep expertise in mathematical and statistical modeling, and will have built models with Python, SQL, and PySpark. This person will test new data assets for the firm, develop AI/LLM tools and agents, contribute to the firm’s analytics library, and use traditional machine learning and statistical methods to improve panel data. M Science expects its data scientists to implement production code, so the ideal candidate will have experience writing well tested, performant, object-oriented code.

What you'll work on

Develop agentic workflows for automated insight retrieval, data analysis, data download, and data forecasting
Contribute to the firm’s documented, unit-tested analytics library
Process, cleanse, and verify the integrity of data used for analysis
Create automated alerting and notification systems for deviations in data quality, validation failures, or unusual patterns
Evaluate new datasets for the firm
Design, develop, and optimize scalable and fault-tolerant data ingestion pipelines using Databricks, Airflow, Python, and Spark
Build resilient data pipelines that handle vendor-related issues such as delayed deliveries, schema changes, incomplete records, and data corruption

What we're looking for

Advanced Python for data processing, scripting, and automation
Fluency in PySpark/Spark and distributed data processing; pandas, dask, daft, polars also a plus
Excellent knowledge of multivariate statistical analysis, including but not limited to ordinary least squares, principal component analysis, factor analysis, LDA, and panel methods
Excellent knowledge of other ML methods including additive modeling and ensemble modeling
Some knowledge of LLMs and common LLM orchestration frameworks like LangChain and LangGraph
Experience with named entity resolution methods a strong plus
Familiarity with cloud data platforms (AWS) and cloud-based storage solutions
Strong troubleshooting skills to diagnose and resolve performance bottlenecks in data pipelines

Tags & focus areas

Used for matching and alerts on DevFound

Fulltime Data Science Ai

Data Scientist

Role overview

What you'll work on

What we're looking for

Tags & focus areas

Ready to Join the Team?