Hiring Senior Software Engineers for AI Evaluation & Benchmarking Work
Looking for Senior Software Engineers for AI Evaluation & Benchmarking Work
Seeking experienced engineers with strong Python/software engineering backgrounds for remote AI evaluation and coding benchmark projects.
Ideal candidates should have:
• 4+ years of software engineering experience
• Strong Python proficiency
• Experience working with large codebases/repositories
• Familiarity with data pipelines, testing, debugging, and evaluation workflows
• Experience in high-performance engineering environments is a plus
Work involves:
• Evaluating AI-generated code
• Building/working with coding benchmarks
• Testing reasoning/debugging capabilities of frontier AI systems
• Technical analysis and structured evaluation tasks
Important:
All candidates will be required to pass a technical evaluation/screening test before onboarding. This is not beginner-level work.
Remote contract-based engagement.
DM with:
• Resume/LinkedIn/GitHub
• Years of experience
• Languages/frameworks you work with
• Relevant benchmark/evaluation/LLM experience (if any)