u/Extra-Campaign7281

I'm working on a project where I'll be fine-tuning LLMs with GRPO on a 170K-sample dataset for explainable LJP (legal judgment prediction, where the model predicts case outcomes and generates step-by-step reasoning citing the facts). I'm considering models like GPT OSS 20B or Qwen 3.5 27B, with a slight preference for Qwen 3.5 27B because of its strong reasoning capabilities.

I recently obtained a 96GB VRAM workstation (RTX PRO 6000) to handle the RL rollouts, which should give some solid headroom for larger models.

What are your recommendations for the best open-source models for GRPO fine-tuning in 2026? Any advice on structuring explainable LJP rewards would also be appreciated.

Thanks!

I recently obtained a 96GB VRAM workstation (RTX PRO 6000) to handle the RL rollouts, which should give some solid headroom for larger models.

What are your recommendations for the best open-source models for GRPO fine-tuning in 2026? Any advice on structuring explainable LJP rewards would also be appreciated.

Thanks!

Best models to tune with GRPO for my use case?

Best models to tune with GRPO for my use case?