Lead Data Engineer + AI Client - Altimetrik Takeda Location: Remote Need minimum 3 years of experien

🌍 Remote, USA πŸ’Ή Full-time πŸ• Posted Recently

Job Description

Lead Data Engineer + AI Client - Altimetrik Takeda Location: Remote Need minimum 3 years of experience as Lead. About the role We're looking for a Senior Data Engineer to build and scale our Lakehouse and AI data pipelines on Databricks. You'll design robust ETL/ELT, enable feature engineering for ML/LLM use cases, and drive best practices for reliability, performance, and cost. What you'll do β€’ Design, build, and maintain batch/streaming pipelines in Python + PySpark on Databricks (Delta Lake, Autoloader, Structured Streaming). β€’ Implement data models (Bronze/Silver/Gold), optimize with partitioning, Z-ORDER, and indexing, and manage reliability (DLT/Jobs, monitoring, alerting). β€’ Enable ML/AI: feature engineering, MLflow experiment tracking, model registries, and model/feature serving; support RAG pipelines (embeddings, vector stores). β€’ Establish data quality checks (e.g., Great Expectations), lineage, and governance (Unity Catalog, RBAC). β€’ Collaborate with Data Science/ML and Product to productionize models and AI workflows; champion CI/CD and IaC. β€’ Troubleshoot performance and cost issues; mentor engineers and set coding standards. Must-have qualifications β€’ 10+ years in data engineering with a track record of production pipelines. β€’ Expert in Python and PySpark (UDFs, Window functions, Spark SQL, Catalyst basics). β€’ Deep hands-on Databricks: Delta Lake, Jobs/Workflows, Structured Streaming, SQL Warehouses; practical tuning and cost optimization. β€’ Strong SQL and data modeling (dimensional, medallion, CDC). β€’ ML/AI enablement experience: MLflow, feature stores, model deployment/monitoring; familiarity with LLM workflows (embeddings, vectorization, prompt/response logging). β€’ Cloud proficiency on AWS/Azure/GCP (object storage, IAM, networking). β€’ CI/CD (GitHub/GitLab/Azure DevOps), testing (pytest), and observability (logs/metrics). Nice to have β€’ Databricks Delta Live Tables, Unity Catalog automation, Model Serving. β€’ Orchestration (Airflow/Databricks Workflows), messaging (Kafka/Kinesis/Event Hubs). β€’ Data quality & lineage tools (Great Expectations, OpenLineage). β€’ Vector DBs (FAISS, pgvector, Pinecone), RAG frameworks (LangChain/LlamaIndex). β€’ IaC (Terraform), security/compliance (PII handling, data masking). β€’ Experience interfacing with BI tools (Power BI, Tableau, Databricks SQL). Apply tot his job

Ready to Apply?

Don't miss out on this amazing opportunity!

πŸš€ Apply Now

Similar Jobs

Recent Jobs

You May Also Like