Every pharmacovigilance database I try has a different wall. Is this study feasible without institutional access?
I posted in r/AskAcademia a week ago about being stuck on IRB funding for an independent public health study on caffeine product labeling. I got a lot of feedback telling me to slow down, get a faculty sponsor, and start with the systematic review before trying to collect primary data. I took that advice, but now I keep hitting new walls, and I am starting to feel like I am missing something obvious.
The "novel" contribution of the study is dose-tier stratification of caffeine adverse events. Caffeinated products vary enormously in caffeine content, a cup of coffee might have 80mg while a pre-workout might have 400mg, but no public database categorizes adverse event reports by how much caffeine was actually in the product involved. I hypothesize that if labeling failures are driving harm, adverse event increases should be concentrated in the highest dose products, the ones consumers are least able to accurately estimate.
The systematic review is registered on PROSPERO and moving forward. The survey arm is parked until I land a faculty sponsor. The database analysis is where I keep running into problems.
I pulled the publicly available HFCS data, the FDA food and dietary supplement adverse event database formerly known as CAERS. After filtering for caffeine-relevant products and ages 12-24 from 2014-2024, I have 238 records. The data has brand names so tier mapping is theoretically possible, but 238 records across 11 years and 4 tiers is too sparse for the regression I designed the analysis around. The trend also goes down rather than up, which may reflect reporting pattern changes rather than actual exposure trends.
NPDS has the volume I need. A 2025 paper found over 32,000 caffeine energy product exposures in NPDS from 2011-2023 among individuals under 20. I am submitting a formal non-member data request right now. The problem I just hit is that getting brand-level product identifiers requires written authorization letters from each brand owner. Without brand names I cannot map products to dose tiers and the whole point collapses.
I am requesting Poisindex product ID codes without brand names and planning to resolve the lookup problem when I have institutional access after transferring to a four-year university. But that could be a year away, and I am not sure the study holds together in the meantime.
I want to be clear that I am not complaining about the difficulty. I knew going in that this would be hard (as many of you also told me), and I have no illusions about my limitations as a first-year community college student doing this without institutional support. But I have put a significant amount of work into this, and I am afraid that the limitations I keep uncovering are compounding to the point where this whole arm of my project is not executable in its current form. I would rather hear that now from people who know more than I do than find out after another few months of work.
Is there a framing of this question that gets around the brand identification problem? Is there a database I have not found that captures caffeinated product adverse events with dose information already attached? Is the surveillance gap itself the publishable finding rather than the trend analysis I designed? Am I missing a perspective entirely?