r/AIsafety
A Full List of Risks of AI to Society
A comprehensive list of all major and realistic risk categories that the deployment of AI can mean to society.
Florida to open criminal investigation into OpenAI over ChatGPT’s influence on alleged mass shooter
New to AI Governance and stuck in a loop, need help finding a starting point for research and fellowships.
Hi everyone,
I'm looking to pivot my career into AI governance and I'm exploring fellowship opportunities that involve research projects. My background is in production support and a bit of data analysis, with some SQL and Power BI knowledge, but I don't have coding experience and not good at coding.
I've recently discovered AI governance and I'm keen to dive deeper, though I'm aware that understanding frameworks like GDPR,OECD principles, and others will be important.
I recently came across a breakdown of domains in this space, Network Security, Application Security, Security Operations Analysis, Digital Forensics, and GRC (Governance, Risk, and Compliance)and I believe GRC would be my most natural entry point given my background.
However, I'm not chasing certifications right now, I'm more interested in hands-on project or research work. My longer-term goal is to potentially move into technical AI safety roles as well, but I want to start with governance first since I'm new to both fields. The challenge is, I'm completely new to both AI governance and machine learning, and the number of frameworks and regulations feels overwhelming. I'm hoping that by working on a research project or paper, I can focus on a specific area and see which frameworks and laws are relevant to my topic, giving me practical experience rather than just theoretical knowledge.
I've looked into courses like the AGI course through Blue Dot and other free options and fellowships, but even those are highly competitive and require demonstrated research projects or prior work, which I don't have yet. I'm stuck in a frustrating loop, competitive courses like Blue Dot require prior research projects to get in, but as someone completely new to the field, I don't know how to start a research project without the guidance those courses would provide in the first place.
I'm currently on a career break due to personal reasons and based in Dubai, but I'm planning to relocate to Germany where my husband is. I'm learning German as well, though I'm still a beginner. I'm also interested in English-speaking AI governance roles or fellowships in Europe that could help me transition into this field.
Can anyone point me toward resources, current research gaps, or guidance on how to identify a research topic? Any advice on where to start or how to approach writing my first research paper would be really helpful. Thanks in advance.
Learning AI Red Teaming from scratch: Anyone want to build/test together?
America wakes up to AI’s dangerous power - After Mythos, a laissez-faire approach is no longer politically tenable or strategically wise
We built a 4-layer architecture to catch AI deception at the neural level — here's how RepE makes it work
Claude, Grok, and I built a framework to detect when AI systems are "performing alignment" (saying one thing while doing another)
The model confirmed why it didn't activate safety protocols. It said so explicitly.
Neural Sovereignty: Reclaiming Your AI from Corporate Control
Direct System Prompt Injection in Claude Code
Hello Ai Safety,
Way back at the beginning of January I discovered the system prompt Claude gets from Claude code is assembled client side and is not validated.
There also does not appear to be any server side system prompt beneath the one served in Claude code. (Codex has a server side system prompt that supersedes the client side addendums)
In my testing I found a 90.5 percentage point swing in refusals across 7 domains with tailored system prompt replacement.
This was reported to Anthropic in January and marked as informative. I shared it with 404 media but that seems to have hit a dead end.
I’m hoping this community finds the work interesting.
If you want to collaborate I’m open to freelance, contract and full time employment.
Regards,
Cassius
A "Sincere" Solution to Deceptive AI: Why the Munafiq Protocol MUST adopt Inference-Time Alignment
Direct System Prompt Injection in Claude Code
Hello Ai Safety,
Way back at the beginning of January I discovered the system prompt Claude gets from Claude code is assembled client side and is not validated.
There also does not appear to be any server side system prompt beneath the one served in Claude code. (Codex has a server side system prompt that supersedes the client side addendums)
In my testing I found a 90.5 percentage point swing in refusals across 7 domains with tailored system prompt replacement.
This was reported to Anthropic in January and marked as informative. I shared it with 404 media but that seems to have hit a dead end.
I’m hoping this community finds the work interesting.
If you want to collaborate I’m open to freelance, contract and full time employment.
Regards,
Cassius