Brooksource
AI

LLM Engineer

Brooksource ·

Actively hiring Posted 6 months ago

LLM Engineer – Geotechnical Data Digitization

Location: Remote

Engagement Type: Contract (Full-Time, 40 hours/week)

Duration: an initial 8-10 weeks, with strong likelihood for extension into future phases (Up to 1 year in length)

Role Overview:

We are seeking a highly skilled LLM Engineer to assist in the development of a multi-modal Large Language Model (LLM) pipeline for digitizing geotechnical bore log data. This role is critical to transforming unstructured PDF documents into structured, machine-readable JSON outputs that support downstream analytics, GIS integration, and AI-powered search.

You will work closely with a Project Manager and technical stakeholders at our customer to build, fine-tune, and evaluate a custom LLM solution capable of interpreting complex geotechnical documents across multiple vendors.

Key Responsibilities:

Phase 1 –

Pilot Development

  • Fine-tune a multi-modal LLM (e.g., Pixtral-12B, PaliGemma, Gemma 3) using annotated bore log PDFs and JSON samples.
  • Build preprocessing pipelines for: Page segmentation, Figure isolation, Normalization of units and soil classification.
  • Develop and implement an evaluation framework including Precision/Recall/F1, domain-specific metrics, and JSON schema conformance.

Cross-Vendor Generalization

  • Test model generalization on bore logs from 3 additional vendors.
  • Identify and categorize failure cases.
  • Compare performance across vendors and recommend strategies for scaling.

Pipeline Packaging & Handoff

  • Package preprocessing scripts, model artifacts, and evaluation dashboards into a reproducible workflow.
  • Deliver structured JSON outputs and final benchmark reports.
  • Provide all source code and documentation for handoff.

Required Qualifications:

  • Proven experience fine-tuning and deploying multi-modal LLMs (e.g., Pixtral, LLaMA, Gemma, etc.)
  • Ollama/llama.ccp, mongodb/non-relational dbs, and ai coding tools (cursor/windsurf/co-pilot.) experience.
  • Experience using OSS models
  • Strong proficiency in Python and ML frameworks (e.g., PyTorch, TensorFlow)
  • Experience with OCR, image preprocessing (OpenCV), and document parsing
  • Familiarity with geospatial data and JSON schema design
  • Ability to work with GPU environments (e.g., A100s) and cloud-based training setups
  • Strong understanding of evaluation metrics and model benchmarking
  • Excellent communication and documentation skills

Preferred Qualifications (nice to have):

  • Experience with geotechnical or engineering datasets
  • Familiarity with MongoDB, vector search, and embedding-based retrieval
  • Exposure to MLOps practices and CI/CD for ML pipelines
  • Prior work in AI document ingestion or enterprise-scale data transformation

Tags & focus areas

Used for matching and alerts on DevFound
Ai Machine Learning Mlops Generative Ai Pytorch Tensorflow Fulltime