r/LanguageTechnology

working as an AI language engineer on LLM projects - what does the day-to-day actually look like

saw a post about the Amazon AI language engineer role and it got me thinking about the broader picture. from what I can tell, a lot of language engineering work has shifted pretty heavily toward, LLM-based stuff - RAG pipelines, agent workflows, fine-tuning smaller models for specific domains, that kind of thing. makes sense given how fast adoption has moved. curious whether people in this space feel like traditional NLP skills (parsing, morphology, the more linguistic, side) still matter much day-to-day, or if it's mostly just prompt engineering and orchestration frameworks now. and for anyone who's made the jump from more classical NLP roles into LLM-heavy work, was the transition pretty smooth or did it require a big re-skill?

reddit.com
u/dallsilre — 5 hours ago

TalentCLEF 2026: NLP shared task on Human Resources (evaluation phase open)

Hi all,

I am one of the organizers of TalentCLEF, a shared task (CLEF campaign) focused on evaluating ML systems for talent intelligence problems, using real-world HR data.

We’ve just released the evaluation dataset, and submissions are open until May 3rd.

The tasks include:

  • Job–candidate matching
  • Skill ranking for job descriptions

This is relevant if you’re working on NLP, IR, or LLM-based ranking systems.

If you haven’t started yet, you’re still on time. We provide Colab tutorials and an evaluation script so you can get a valid submission quickly.

Even simple baselines are enough to get on the leaderboard and iterate from there!

Here is the link in case anyone is interested :) : https://talentclef.github.io/talentclef/docs/

reddit.com
u/luisgasco — 1 day ago

Node-based document processing

Hello, I am considering building out a document processing interface that uses nodes to (hopefully) simplify pipeline development for non-technical users. For example, it would begin with a data ingestion node (PDFs, etc.), then a text recognition node, field extraction, human in the loop checkpoint, and so on. We would offer a base OCR model built into the software but allow users to upload their own APIs for custom models. As of now my idea for the output node would just be to save it to the computer’s files or send it off using a web hook, not too sure about that part right now. I’d be interested in hearing what everyone thinks about this idea

reddit.com
u/Zealousideal_Coat301 — 2 days ago

ACL 2026 camera-ready submission

Hi, it’s my first time submitting to ACL. Based on the conferences I have submitted to so far, they always send me the details, like the ISBN and venue information, and then I need to upload the LaTeX as well.

But now I’m wondering how to add the footnote, i.e., Proceedings of the nth Annual Meeting of the Association for Computational Linguistics… vol. 1, page …). Do we need to only submit the PDF file with the copyright transfer signature? And will this footnote be attached programmatically, like a stamp, to the paper?

I cannot understand the procedure…​​​​​​​​​​​​​​​​

reddit.com
u/Dazzling_River_7286 — 5 days ago

Building an open-core Romanian morphological analysis API — looking for feedback

Romanian NLP tooling sits at roughly 15% of what exists for English. The academic resources exist (DEXonline, RoLEX, UD Romanian Treebank) but there's no production-ready REST API for morphological analysis, verb conjugation, or noun declension.

I'm building LexicRo to fill that gap. Pre-development stage, looking for honest feedback on the approach.

Planned endpoints:

  • POST /analyze — token-level morphological analysis (lemma, POS, case, gender, number, person, tense)
  • GET /conjugate/{verb} — full conjugation table across all moods and tenses
  • GET /inflect/{word} — all inflected forms of a noun or adjective
  • GET /lookup/{word} — lexical data from DEXonline
  • POST /difficulty — CEFR level scoring calibrated to Romanian B1/B2 exams

Technical approach:

  • Fine-tuning bert-base-romanian-cased-v1 for morphological tagging
  • verbecc Romanian XML templates for conjugation (extended)
  • Training data: UD Romanian Treebank + RoLEX + DEXonline dump
  • FastAPI service, Docker, OpenAPI spec

Licence: MIT code, CC BY-NC model weights (free for research). Free tier: 1,000 req/day.

Phase 1 (conjugation + lexical lookup) ships in ~3 months. Morphological analyser follows in phase 2.

Questions I'm genuinely trying to answer:

  1. Is fine-tuning Romanian BERT on the UD treebank (~9k sentences) going to give reliable enough morphological tagging for production use, or do I need more data?
  2. Anyone worked with the RoLEX dataset — is the morphosyntactic annotation consistent enough to use as training data directly?
  3. Are there Romanian NLP resources I'm missing that would be worth incorporating?

Site: lexicro.com | GitHub: github.com/LexicRo

reddit.com
u/gofractal — 3 days ago

AI Language Engineer @ Amazon Interview and Career Prospects

Hi,

I have an interview coming up for this role and wanted to know a few things if anyone have shed light on them:

  1. Is the livecoding component leetcode or data prep and text data manipulation (regex, file uploads, table changes etc)? The JD honestly doesn't describe software eng as much as it describes data analysis so I'd be surprised at LC but pls correct me if I'm wrong.

  2. I have a more ML-leaning role currently but I'm tempted by the "amazon" name as my current company is unknown. I'm worried this job would close doors to future ML eng roles but from what I see on LinkedIn, there are people who've started as LEs and transitioned into more ML and DS roles. How open is Amazon to lateral movement (ie if they don't lay u off before lol)?

  3. Some posts mention a day-long interview (1hrs x 5 sessions). Are these paid?

Thanks!

reddit.com
u/Competitive-Menu1583 — 4 days ago

Best embedding model for code search in custom coding agent? (March 2026)

I’m building a custom coding agent (similar to Codex/Cursor) and looking for a good embedding model for semantic code search.

So far I found these free models:

  • Qodo-Embed
  • nomic-embed-code
  • BGE-M3

My use case:

  • Codebase search (multi-language)
  • Chunking + retrieval (RAG)
  • Agent-based workflows

My questions:

  1. Which model works best for code search
  2. Are there any newer/better models (as of 2026)?
  3. Is it better to use code-specific embeddings?

Would appreciate any suggestions or experiences.

reddit.com
u/Mountain-Act-7199 — 13 hours ago

Working with BERTopic the first time for thesis

Hi everyone,

I’m a psychology undergraduate currently working on my bachelor’s thesis, where I’m using BERTopic for text analysis. My supervisor unfortunately doesn’t have much experience with coding, so I’m trying to figure things out and optimize my code on my own.

I was wondering if anyone here might have experience with BERTopic (or similar topic modeling approaches) and would be willing to r take a quick look at my approach/code?

(And sorry if this is not the right place to ask.)

reddit.com
u/ResearchAreaPsych — 6 days ago

Finetune Llama3.2-1B on GSM8K. How to do better :(

Hi all,

I have been working on finetuning Llama3.2-1B on GSM8K for over a month. The best score I can get so far is 22.14 ( baseline is 6.07 evaluated with lm_eval on my server, few shot 8). I've tried adjusting hyperparameters like batchsize, learning rate, epochs, warm_up ratio, lr_scheduler.....

Since I am new in this field, I would like to know if there is anything I could do better. Or if this score is the ceiling of Llama3.2-1B.

I appreciate any comment or instruction, thanks!

reddit.com
u/Old-Shelter2517 — 4 days ago

Qwen 3.6-Plus, Agentic Coding, and the Causal Inference Gap

The recent release of Qwen 3.6-Plus, announced mid-May 2024, with its 1M context window and enhanced agentic coding capabilities, has naturally amplified discussions around truly autonomous agents. The excitement is palpable; the prospect of an LLM not just generating code but orchestrating complex execution pipelines, identifying errors, and self-correcting, promises a significant shift in development paradigms, particularly for tasks involving software engineering.

However, this very autonomy introduces a subtle, yet profound, causal inference challenge that often gets overlooked. When an agent self-corrects based on an observed outcome, are we witnessing true causal reasoning, or merely sophisticated correlation mapping within its vast parameter space? My experience across thousands of A/B tests in financial tech suggests a critical distinction. A system designed to optimize for a metric often learns the what and when, not the why.

The 1M context window, while impressive for synthesizing observational data, doesn't inherently imbue the model with a counterfactual understanding. If an agent refactors code and a performance metric improves, it observed an association. It did not necessarily intervene on the true causal lever in a way that generalizes robustly outside its immediate operational context. The risk lies in attributing causal agency where only predictive excellence exists, potentially leading to brittle systems that fail when an unobserved covariate shifts. Pour moi, the real leap will be when these agents can articulate and rigorously test specific causal hypotheses, not just optimize via iterative trial and error.

reddit.com
u/clairedoesdata — 6 days ago

Why do most live translation tools still fall apart in actual two-way conversations?

Had a supplier call last month that made me realize how bad most “live translation” setups still are in real conversations.

It was about 45 minutes, neither of us was speaking in our first language, and by the end I felt more tired from trying to understand the call than from the call itself.

Half the time I was squinting at auto-captions. The other half I was copying lines into another tab just to make sure I wasn’t misunderstanding something important.

Which obviously doesn’t work when you’re supposed to be having an actual back-and-forth conversation.

So I went down a rabbit hole on this and the main thing I realized is that most people lump very different use cases together.

A presentation and a conversation are not the same problem.

If one person is speaking and everyone else is listening, subtitles are usually enough. You can share a caption feed, people follow along, done.

But once it turns into a real two-way meeting, subtitles alone start slowing everything down. You’re reading, processing, replying, and the timing gets awkward fast. It’s manageable, but it doesn’t feel natural.

That’s the part I don’t think most product pages explain clearly.

For an actual conversation, translated voice output matters way more than I expected. Hearing the other person in your own language is just a very different experience from trying to keep up through captions.

The problem is that most built-in meeting tools seem to stop at captions.

Teams, Meet, Zoom, etc. all have something in this category now, but once I started looking closer, a lot of the default options felt more useful for:

  • major language pairs
  • one-way meetings
  • bigger enterprise setups

…not really for a small supplier call where two people just need to speak normally without getting stuck in caption-reading mode.

That’s where I kept running into the same gap:
some tools are good at subtitles,
some are good at event-style interpretation,
but not many seem designed for a normal small meeting where you want both:

  • translated subtitles
  • and translated voice at the same time

While digging around, one of the tools I came across was TransGull, and what caught my attention was that it seemed closer to that exact use case — small online meetings where you want subtitles on screen and translated voice through headphones, without rebuilding the whole meeting workflow around a conference-style setup.

That felt more relevant to what I was actually trying to solve than a lot of the bigger “enterprise interpretation” tools.

My takeaway at this point is basically:

  • subtitles are fine for presentations
  • two-way meetings are a different technical problem
  • and most tools are better at one than the other

Curious what other people here are using, especially for less common language pairs.

And for anyone who’s used translated voice in live calls: did it actually make the conversation feel more natural, or did you still end up leaning on subtitles most of the time?

reddit.com
u/shinigami__0 — 9 days ago

Gothenburg vs Manchester vs Uppsala for Computational Linguistics

Hello! I've been accepted to two programs and I'm struggling to decide between Gothenburg and Manchester. I'm also on the waitlist to study at Uppsala. I would love to hear from students or anyone who has knowledge about these schools.

  • University of Gothenburg - MA in Language Technology
    • Fee-exempt student because I'm EU
  • University of Manchester - MSc in Corpus and Computational Linguistics
    • International student (37k euros)
  • University of Uppsala - MA in Language Technology
    • Fee-exempt student
    • On reserve

While I have enough funds for Man and my parents are willing to fill in any living costs I'd need to pay, it's still quite an investment.

Here is some of the things I've achieved during my BA:

  • Constructed a Corpora of direct speech (ELAN, Phonological transcription, basic report on our methodology)
  • Built a static website using HTML/CSS, and currently I'm learning C# and JS
  • Extracted selected words and phrases of our Corpus, eliminating every discourse marks, disfluencies or unnatural structure using Python with pandas and stanza for it
  • Created a Wordle and a Phrasle game using Python with tkinter among other modules.
reddit.com
u/Practical-Cup7292 — 13 days ago

Speech models feel fine until you put them in real conversations

Been working around conversational data recently, and this keeps showing up.

Most speech datasets are too clean compared to actual usage.

In real conversations (especially multilingual ones):

* people interrupt each other

* there’s overlapping speech

* code-switching happens mid-sentence

* context jumps quickly

But training data usually assumes clean turns and stable language.

That mismatch starts to show up fast when you plug models into real workflows.

Feels less like a model limitation and more like a data distribution problem.

Would be interested to hear how others here are handling this, especially if you’re deploying in multilingual or noisy environments

reddit.com
u/Cautious-Today1710 — 12 days ago

UBC MDS in Computational Linguistics - networking, projects, lab opportunities?

Hello all, I recently received an admission offer from the Master of Data Science in Computational Linguistics program at UBC in Vancouver. I am not sure this program is what I'm looking for and have the following questions. I would really like to hear what past or current students think!

  • Has the program provided good opportunities to network with people working in comp ling/NLP?
  • Besides the capstone project, are there other projects in the curriculum that could be shown in a portfolio/on a resume?
  • Are there opportunities to work in a lab/do research during or after the program? I saw there is a NLP group at UBC, but it's in the computer science department, so I'm wondering whether MDS-CL students are able to get involved there or in something similar.

Thanks! (cross-posted)

reddit.com
u/pearlxthunder — 12 days ago

Resolving Semantic Overlap in Intent Classification (Low Data + Technical Domain)

Hey everyone,

I’m working on an intent classification pipeline for a specialized domain assistant and running into challenges with semantic overlap between categories. I’d love to get input from folks who’ve tackled similar problems using lightweight or classical NLP approaches.

The Setup:

  • ~20+ functional tasks mapped to broader intent categories
  • Very limited labeled data per task (around 3–8 examples each)
  • Rich, detailed task descriptions (including what each task should not handle)

The Core Problem:
There’s a mismatch between surface-level signals (keywords) and functional intent.
Standard semantic similarity approaches tend to over-prioritize shared vocabulary, leading to misclassification when different intents use overlapping terminology.

What I’ve Tried So Far:

  • SetFit-style approaches: Good for general patterns, but struggle with niche terminology
  • Semantic anchoring: Breaking descriptions into smaller units and using max-similarity scoring
  • NLI-based reranking: As a secondary check for logical consistency

These have helped somewhat, but high-frequency, low-precision terms still dominate over more meaningful functional cues.

Constraints:
I’m trying to avoid using large LLMs. Prefer solutions that are more deterministic and interpretable.

Looking For:

  • Techniques for building a signal hierarchy (e.g., prioritizing verbs/functional cues over generic terms)
  • Ways to incorporate negative constraints (explicit signals that should rule out a class) without relying on brittle rules
  • Recommendations for discriminative embeddings or representations suited for low-data, domain-specific settings
  • Any architectures that handle shared vocabulary across intents more robustly

If you’ve worked on similar problems or have pointers to relevant methods, I’d really appreciate your insights!

Thanks in advance 🙏.

reddit.com
u/Formal-Author-2755 — 8 days ago