Role overview

We are seeking a highly skilled LLM Engineer to assist in the development of a multi-modal Large Language Model (LLM) pipeline for digitizing geotechnical bore log data. This role is critical to transforming unstructured PDF documents into structured, machine-readable JSON outputs that support downstream analytics, GIS integration, and AI-powered search.

You will work closely with a Project Manager and technical stakeholders at our customer to build, fine-tune, and evaluate a custom LLM solution capable of interpreting complex geotechnical documents across multiple vendors.

Responsibilities

Fine-tune a multi-modal LLM (e.g., Pixtral-12B, PaliGemma, Gemma 3) using annotated bore log PDFs and JSON samples.
Build preprocessing pipelines for: Page segmentation, Figure isolation, Normalization of units and soil classification.
Develop and implement an evaluation framework including Precision/Recall/F1, domain-specific metrics, and JSON schema conformance.
Test model generalization on bore logs from 3 additional vendors.
Identify and categorize failure cases.
Compare performance across vendors and recommend strategies for scaling.
Package preprocessing scripts, model artifacts, and evaluation dashboards into a reproducible workflow.
Deliver structured JSON outputs and final benchmark reports.
Provide all source code and documentation for handoff.

Basic qualifications

Proven experience fine-tuning and deploying multi-modal LLMs (e.g., Pixtral, LLaMA, Gemma, etc.)
Ollama/llama.ccp, mongodb/non-relational dbs, and ai coding tools (cursor/windsurf/co-pilot.) experience.
Experience using OSS models
Strong proficiency in Python and ML frameworks (e.g., PyTorch, TensorFlow)
Experience with OCR, image preprocessing (OpenCV), and document parsing
Familiarity with geospatial data and JSON schema design
Ability to work with GPU environments (e.g., A100s) and cloud-based training setups
Strong understanding of evaluation metrics and model benchmarking
Excellent communication and documentation skills
Experience with geotechnical or engineering datasets
Familiarity with MongoDB, vector search, and embedding-based retrieval
Exposure to MLOps practices and CI/CD for ML pipelines
Prior work in AI document ingestion or enterprise-scale data transformation

Tags & focus areas

Used for matching and alerts on DevFound

Ai Machine Learning Mlops Generative Ai Pytorch Tensorflow Fulltime

LLM Engineer

Role overview

Responsibilities

Basic qualifications

Tags & focus areas

Ready to Join the Team?