r/genomics

▲ 35 r/genomics+23 crossposts

Most people who followed $CYDY remember March 30, 2021. The FDA publicly stated that CytoDyn's claims about leronlimab were "misleading and not supported by the data", no benefit was shown in COVID-19 treatment trials. The stock dropped 25%+ that day.

What happened afterward was a class action lawsuit covering investors who held $CYDY between March 27, 2020 and March 30, 2022.

A $500,000 settlement has been reached and terms are now submitted to the court for approval.

Who qualifies?

Anyone who held $CYDY during the class period and suffered losses from the alleged misrepresentations about leronlimab's effectiveness for HIV and COVID-19.

Can I still apply?

Yes, you can submit your application now and it will be processed once claims filing officially opens after court approval.

If you were damaged by this don't forget to check your eligibility. GL!

u/JuniorCharge4571 — 2 hours ago

Do you know of reliable Direct-to-Consumer Whole Genome Sequencing (WGS)?

I am interested in doing whole genome sequencing (WGS). Does anyone here have any experience, positive or negative, with current DTC providers?

Prior recommendations seem like they aren't a great idea. Nebula has a huge backlog and dubious financial position. Dante labs also seems to be collapsing. Sequencing.com uses Chinese labs currently blacklisted by the DOD. Invitae was bought by LabCorp and no longer DTC. Researcher providers like All of Us Research seem to have stopped providing people with their WGS results.

Some names that do come up that I am curious about: Psomagen, YSEQ, tellmeGen, SelfDecode, Nucleus Genomics, Sano Genetics.

Disclaimer: This is already in collaboration with my doctor. We are looking for some specific things and having them all go through clinical genomic testing is far more expensive than a DTC 30x WGS test. I do not need any assistance with data interpretation, just need reliable raw data. If a major health risk is flagged, I am prepared to do confirmatory clinical testing.

reddit.com
u/MatchaManiak — 20 hours ago

Getting sequencing data and insights

I recently had a stillbirth at 24 weeks, and one of the issues associated with the timing of the preterm birth is cervical insufficiency, which could be a genetic thing for some people (ie collagen deficiencies). It’s really hard to tell though because there’s many things associated with preterm birth. However, I am curious and want to dig further by looking into my genetics.

My friends have talked about how they uploaded their 23andMe data to chatGPT and have gotten some findings that resonate with them, which prompt them to take supplements or eat differently or pay attention to different things.

I’m hoping to learn something about my genetic health risks so that my next pregnancy can be the best it can be (of course, I will also see a MFM high risk doctor). I’m wondering what kind of sequencing I should do? I’m worried about doing WGS because it’s too much data for ChatGPT to process. Should I do something smaller? Is 23andMe even around still? What do you guys recommend?

reddit.com
u/Aardvark_Adorable — 2 days ago

Nucleus Genomics experience

I just did whole genome sequencing with Nucleus for me and a few family members. Solid A/A- experience - the whole process took a few weeks (fewer than expected) and we received high quality files that I was able to run through a local genomics pipeline to get detailed analysis for the family.

The - here is for the Nucleus probability reports and analyses, which are OK, but have the detailed information hidden and are presented in too "risk forward" of a way. They also missed a few things that my local genomics pipeline caught.

In any case, for anyone looking to do WGS for their family, as a first-timer who is technical, I thought this was a very solid offering.

reddit.com
u/baalzephon — 5 days ago
▲ 7 r/genomics+1 crossposts

Random Forest Classifier Training for population structure identification QC in a GWAS analysis

Hello,

I am currently performing a GWAS and am at the quality control stage, more precisely at the "ancestry" analysis. My goal is to select a homogeneous subpopulation to prevent population stratification during the subsequent statistical analysis.

To achieve this, I followed the plinkQC tutorial tilted "Training a Random Forest Classifier for Population Structure Identification", using the HapMap Phase III dataset (as suggested in the tutorial).

https://meyer-lab-cshl.github.io/plinkQC/articles/AncestryCheck.html

I trained my model using 77 individuals per subpopulation, which corresponds to the size of the least represented group (MXL).

https://preview.redd.it/f6ved33thl0h1.png?width=564&format=png&auto=webp&s=d815f571391c0ddcc3fcc7cc47d7e2ae5e0bc18d

I chose this approach to avoid class imbalance, which could bias the classifier. However, the estimated OOB (Out-of-Bag) error rate after training is 22.67%, which is too high (I'm going to select CEU subpopulation).

https://preview.redd.it/ptdx80mvhl0h1.png?width=652&format=png&auto=webp&s=50d63b8bcc84d1053e0f22c76e0aeb9096b1a5c3

To improve accuracy, I have explored several approaches :

- Principal Component Analysis: I observed that the accuracy of my model increases as I include more PCs.

https://preview.redd.it/meb314rmhl0h1.png?width=2880&format=png&auto=webp&s=d7f840f96358c75b62a9276d75d4a2c1b4aa2dd9

- Sampling Strategy: Using an equivalent proportion per subpopulation rather than a fixed count to maximize the total number of individuals used for training.

- Reference Panel Uprgade: Replacing HapMap III with 1000 Genomes Project Phase III data, which offers a significantly larger sample size (this is my current focus).

My questions:

1 - Would using 1000 Genome Phase III data significantly imporve the classifier's accuracy compared to HapMap III?

2 - Are the other reference datasets available that might further enhance the model's accuracy?

3 - Is using a proportion of individuals per subpopulation rather that a fixed count considered a valid practice, and does it effectively imporve accuracy?

Note: I should clarify that I am not a ML engineer, I am a Master 2 bioinformatics sutdent . My utlimate objective is to identifiy variants associated with a specific population through statistical analysis, rahter than achieving a perfectly optimized classifier. While I understand that QC is the most critical stage of a GWAS, unfortunately my current deadling do not allow me to spend excessive time on this specific sted. Thank you for taking this into consideration in your response !

reddit.com
u/Mathyato_ — 2 days ago
▲ 3 r/genomics+1 crossposts

Disclaimer! Illustrative DNA does not use official Davidski G25 coordinates.

This is a follow‑up post regarding the Kenyan Kalenjin results. After getting inconsistent outcomes from Illustrative DNA, I decided to dig deeper. It appears they use scaled coordinates, which in my case were highly inaccurate.

After taking due diligence, I purchased the official Davidski G25 coordinates, and the results aligned perfectly with my ethnic background. Honestly, charging $30 for scaled coordinates feels excessive, especially since similar data can be found free of charge on various DNA sites.

If you genuinely want to understand your ethnic background, I strongly recommend buying the official Davidski G25 coordinates and analyzing them with tools like Vahaduo or DNA Genics — it’s cheaper and far more accurate.

For those on a budget, LM Genetics K47 is an excellent alternative, particularly for individuals with African heritage. It’s the only calculator that closely matches my raw G25 results.

Honorable mention: Eurogenes K36 also performs decently.

For comparison, the second slide is the result from using  raw Davidski G25 coordinates on Vahaduo (Global G25 PCA) which is super precise considering my Kalenjin ancestry. The third slide is also using the same official coordinates on the AfroGeno Modern (Unscaled) IY8 calculator on DNA Genics G25 Studio.

TLDR: If you want official coordinates, request them directly from Davidski.

If you come from a well‑referenced population, Illustrative DNA might still be relatively accurate — but not for the price.

https://preview.redd.it/d6zco4kxmb0h1.jpg?width=1290&format=pjpg&auto=webp&s=914a2e7af38d6d878bace4a041e3091319dfa140

https://preview.redd.it/hofowfdzmb0h1.png?width=515&format=png&auto=webp&s=9516f51fed177eb2f6a73833b17ba6cb7dd9c295

https://preview.redd.it/31ov3s5hnb0h1.png?width=796&format=png&auto=webp&s=7931e193254538a7e4a2e5ade5e5c30a072eb977

reddit.com
u/genealogykenya — 3 days ago
▲ 4 r/genomics+1 crossposts

That it’s free from subtle manipulation?

The target is the (DTC) WGS providers.

So If they did fake it (or some of it) at all, they are clearly skilled enough to bypass basic methods.

I’m not sure whether I’m allowed to mention names, but the company in question provides a BAM file and two FASTQ files (processed, not raw).

u/Express_Ad_6394 — 10 days ago

To those who have spent basically their life in science particularly in genomics, with AI eating jobs every day, which skills in this field should a newbie develop to survive during these times. I sincerely hope to see some informative answers. Thanks!

reddit.com
u/FreeloaderFatso — 8 days ago
▲ 4 r/genomics+1 crossposts

As the title suggests, I did WGS on an isolated strain of bacillus licheniformis. Yet I have a lot of questions.

To start, I'm a junior in high school. I became very interested in biotechnology and such when I was a freshman and took AP Bio. Our teacher (despite not teaching all that much) decided it would be a good idea to let us have a little AMGEN experience in the classroom. It was really fun and I enjoyed it, so much so that he recommended me to look into the biotechnology field. Fast forward to a couple years later, I joined a biotechnology program at my local community college because our district allows us to dual enroll in college courses while being in high school. I passed biotech 002 and I'm concurrently in biotech 003 where we are allowed to lead our own independent project. From there, my professor suggested I do something on sequencing since I've been fascinated with genetics.

A couple years prior to me joining the class, our professor brought different kinds of yogurts to the classroom and one of them was chobani. They would extract the bacteria from the yougurts by growing them on plates and isolating the colonies, however, the one with chobani would consistently grow a strain unlike the rest of the plates. Fast forward, one of the students performed 16s sequencing of that isolated chobani and determined it to be bacillus licheniformis. What interested me the most was how in the world would chobani which shouldn't contain bacillus licheniformis suddenly dominate the growth in the plates?

Nevertheless, I'm still a fair beginner in genetics and biotechnology, and I proceeded with the project. The isolated strain was saved in the ultrafreezer and from there I began the preparation for WGS. Streak, obtain isolated colony, grow in LB Broth, and extract DNA. My professor had just recently received some Nanopore technology stuff and I used the MinION and barcoding kit. I prepped my library following the kit protocol and ran the sequencing using the MinION. I only ran it for around a day since the flow cells I had were pretty old to begin with (around 6 months) and there weren't much pores so the sequencing just became asymptotic after ~24 hours. After, I obtained my FASTQ files and did some downstream processing with usegalaxy.org and followed the WSG pipeline. Concatenate the files, QC with nanoplot, assemble it with Flye, polish the assembly with Medaka, annotate it with Prokka. I did a couple of irrelevant things but moving on, I used Proksee and inserted my Prokka FASTA files and got the following genome in the image of the post.

Looks pretty cool and I also did some antiSMASH and found it's pathways using KAAS. To be honest, I don't really understand a chunk of my information but my professor was impressed. So much so, he recommended I publish these results. My coverage was around 9x which is pretty low, but for the equipment that I used and for me being a beginner in everything I think it was a sucess because the genome looks pretty assembled to me.

What's interesting is how this was derived from chobani yogurt. I compared it to the NCBI DCM 13 strain and it was around a 99.4% match result. The 0.6% is interesting for me to see what's different.

But I guess I'm here because I'm pretty much stuck. Yeah, I did do WGS on this but I don't necessarily know what else to do or what I should use to compare my strain to other strains. I should probably publish this to NCBI or other databases but again I'm a complete beginner in terms of this field. What do you guys think? Is this type of dataset suitable for submission to public databases, and if so, what standards should I meet first? What’s the best approach for comparing my strain to reference genomes? Is it worth it to investigate pathways?

u/FxnnyValentine — 10 days ago
▲ 2 r/genomics+1 crossposts

Hi everyone,
I’m an undergraduate working on RNA-seq dissertation on an insect organism and I’m really struggling with how to actually write up and structure my results. Mainly to do with fertilisation , transcripts present in them ,
I have 3 research questions, and for each one I’ve generated key plots (MDS, volcano plots, heatmaps etc.), so in total I’ve got about 9 figures. The analysis itself is done, but when it comes to writing it up, I keep getting stuck.
Every time I draft something, my supervisor says it’s too fluffy and not really helping or interpreting the results properly… which is frustrating because I genuinely don’t know what I’m doing wrong or how to improve it.
I guess my main issues are:
How do you start writing a Results section for RNA-seq?

What should you actually say for each plot (beyond just describing it)?

How much biological interpretation vs description is expected?

How do you structure it so it’s not repetitive across multiple research questions?

Right now I feel like I’m either:
just describing what the plot shows (too basic), or

over-explaining things and it becomes waffle

If anyone has:
a clear structure/template for writing RNA-seq results

examples of good Results sections

or advice on how to move from “description” → “real interpretation”

Thanks!

reddit.com
u/Most_Secretary_9146 — 9 days ago

I've seen pictures from DNA testing companies showing their customers the percent heritage. Like 10% African, 30% Northern European, 60% Asian, etc. those are obviously made up numbers. but the website below has more extensive discussion of how they do it.

https://genomelink.io/blog/decoding-dna-match-percentages-what-they-reveal-about-your-ancestry

it has some cM relations to the percent, but it's not very clear. Can anyone explain it like i'm 5, or is that too much to ask?

u/bo_reddude — 12 days ago