r/RStudio

Creating a stacked bar chart with a complex data set - advice please

Update: Has been solved, thank you for all the responses

Hi everyone,

everyone has been so kind and helpful so I am asking one last question, that the internet, unfortunately, could not answer for me...

I would like to create a stacked bar chart with a complex dataset. My dataset looks a little like this:

Work Group 1a Group 1b Group 2a ...(up to 9)
yes 0 1 0 ...
no 1 0 0 ...
...

I have tried to use this explanation online, but I am unsure what to add for "points" in the code.

#create data frame
df <- data.frame(team=rep(c('A', 'B', 'C'), each
=3),
                 position=rep(c('Guard', 'Forward', 'Center'), times
=3),
                 points=c(14, 8, 8, 16, 3, 7, 17, 22, 26))

#view data frame
df

  team position points
1    A    Guard     14
2    A  Forward      8
3    A   Center      8
4    B    Guard     16
5    B  Forward      3
6    B   Center      7
7    C    Guard     17
8    C  Forward     22
9    C   Center     26



library
(ggplot2)

ggplot(df, aes
(fill=position, y=points, x=team)) + 
  geom_bar(position='stack', stat='identity')

Further explanation:

I am trying to map which students have time for leisure so the dataset looks as follows:
'Work' answers the question "Do you work?" with Yes or no
Group 1a would be: Yes I have time for leisure and my parents support me
Group 1b would be: Yes I have time for leisure and my parents don't support me --> if a person falls into this category I assigned a 1, if they don't a 0 --> this counts for all the groups (up to 9).

I would like to have all the groups on the x-Axis and the answers to "do you work" stacked for each group.

Would the best approach be, to group the yes or no answers and count the values for each group and then based off of that do the stacked bar chart?

Unfortunately, since it has taken me a while to relearn a lot about R and there were a lot of data to present and organise, I am by now in a bit of a time crunch, so I only have today to finish all my graphs and I don't have as much time as I would like to try out different approaches. I'd appreciate any help you can give me.

reddit.com
u/fuckpineapplepizza — 1 day ago

Trying to add labels of count to my stacked bar chart

Hi everyone,

thank you everyone who has taken the time to help me before, I am really, really appreciative. Since I don't know the language that well yet and I am very much learning by doing, as I finish the project I am working on presently, I struggle with finding the errors, when I apply the answers others were given online.

I have created a stacked bar chart and I would very much like to add counts to the columns.

surveyresponses_Freizeit_Master_for_stacked %>%
  pivot_longer(-Arbeit, names_to = "Group", values_to = "value") %>%
  summarise(count = sum(value), .by = c("Arbeit", "Group")) %>%
  ggplot(aes(Group, count, fill = Arbeit,)) +
  geom_col() +
  theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust=1))

This would be my code, and it produced the graph as I need it... However, when I tried to add the count, based on this code that tackled a similar problem:

# Source - https://stackoverflow.com/a/63656093
# Posted by stefan, modified by community. See post 'Timeline' for change history
# Retrieved 2026-05-13, License - CC BY-SA 4.0

library(ggplot2)

ggplot(mtcars, aes(cyl, fill = factor(gear))) +
  geom_bar(position = "fill") +
  geom_text(aes(label = after_stat(count)),
    stat = "count", position = "fill"
  )

I receive this result:

Browse[1]> surveyresponses_Freizeit_Master_for_stacked %>%
+   pivot_longer(-Arbeit, names_to = "Group", values_to = "value") %>%
+   summarise(count = sum(value), .by = c("Arbeit", "Group")) %>%
+   ggplot(aes(Group, fill = Arbeit, label = after_stat(count)), stat = "count") +
+   geom_col() +
+   theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust=1))
Error during wrapup: Problem while mapping stat to aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error:
! Aesthetics must be valid computed stats.
✖ The following aesthetics are invalid:
• `label = after_stat(count)`
ℹ Did you map your stat in the wrong layer?

I understand the error message, but I am not sure what I ned to change to get the desired result... Again, I appreciate any help!

u/fuckpineapplepizza — 11 hours ago

How do you access disabled functions in R packages? I'm trying to use some functions from wehoop and hoopR

I'm doing a thesis on understudied defensive effects on win percentages and salary in the WNBA and NBA. I found an R program package called wehoop (WNBA) and hoopR (NBA) that had load functions for all the stats I wanted to use (shown in the code blocks). I went to run them today and on the women's side it says the package doesn't exist in the 3.0.0 version of wehoop and makes blank data tables in the 2.0.0 version. The hoopR version is also creating blank tables with all zeroes.

Does anyone know how to access the old functions and data? I think the functions were disabled or something in the newer versions and I can't get the old versions to work.

wnba_boxscorehustlev2

nba_leaguehustlestatsplayer
reddit.com
u/Lucky-Efficiency-644 — 4 hours ago
▲ 64 r/RStudio+1 crossposts

glmbayes is now on CRAN — Bayesian GLMs with familiar glm() syntax, no MCMC required

I've just published glmbayes to CRAN. The motivation was simple: I wanted Bayesian inference for standard GLMs without the overhead of learning Stan, JAGS, or brms.

The syntax mirrors base R's glm() almost exactly:

# Frequentist
fit <- glm(counts ~ outcome + treatment, family = poisson())

# Bayesian — iid posterior samples, same formula interface
ps  <- Prior_Setup(counts ~ outcome + treatment)
fit <- glmb(counts ~ outcome + treatment,
            family  = poisson(),
            pfamily = dNormal(mu = ps$mu, Sigma = ps$Sigma))

summary(fit)  # posterior summaries, credible intervals

A few things that might be interesting:

  • Uses iid accept-reject sampling (Nygren & Nygren, 2006) on log-concave likelihoods — no chains, no warmup, no convergence diagnostics. Every draw is independent, so ESS = n.
  • Supports Gaussian, Poisson, Binomial, and Gamma families.
  • S3 interface mirrors glm()summary(), predict(), residuals() all work as expected.
  • Passes checks across all 10 CRAN flavors (Linux/Windows/macOS, devel/release/oldrel).
install.packages("glmbayes")

Feedback very welcome — especially from anyone who has tried to introduce Bayesian methods in a teaching context where MCMC complexity is a barrier.

reddit.com
u/Bucksswede — 7 days ago

msummary p-values different from the p-values of my models

Hi!

I'm making summary tables for a set of linear mixed models using the function msummary and the package KableExtra. My problem is that the p-values given by the msummary function I use to build my tables are not the same that the ones in my models. I understood that msummary has a different was of calculating the p-values than the summary(lmer) but I really need the p-values from my actual models and I don't manage to figure out how to get msummary to calculate/extract that. Does someone has an idea about what I could do to fix that?

Here's my code:

modeltable=msummary(models,
                    output = "kableExtra",
                    statistic = c(
                      "SE = {std.error}"),
                    stars = c('*' = .05, '**' = .01, '***' = .001),
                    coef_map = coef_map,
                    gof_map = NA,
                    add_rows = add_rows,
                    fmt = 3,
                    escape = FALSE)

Many thanks!

reddit.com
u/aNervousBiologist — 1 day ago

Pay someone to do my homework

To pass my R studio class I need to finish this project, it’s pretty straight forward but I’m too lost and it’s too far gone to learn now. Please help!! I’ll pay

reddit.com
u/Vegetable_Ad_6369 — 1 day ago

Calculating percentages

Hi everyone,

thank you for your help last time with finding the problem in my code for plotly. I am struggling with calculating the percentages and receiving a tidy usable table using the mutate() function. Unfortunately, all the tutorials online do not seem to work for me and I don't understand what I am doing wrong.

```

surveyresponses_Freizeit_Master_count=surveyresponses_Freizeit_Master%>% count(.$`Bleibt genug Zeit für Freizeit?`)

```

After this I receive a table where all the answers per group have been calculated and what I would need is an additional column in which I have the percentages adding up to 100% and I am not sure how to get there... Could anyone please help? I would really like to learn how to do it and to truly understand it, because while I could do it by hand, I do have two more datasets I need to do this for. I appreciate any help.

https://preview.redd.it/ks6vlv7xyn0h1.png?width=874&format=png&auto=webp&s=0a81cc3ef285846578dbd28c60a9f586736983d5

reddit.com
u/fuckpineapplepizza — 1 day ago
▲ 23 r/RStudio+1 crossposts

How much S7 is my R package?

Hi everyone,

I’ve been exploring the new S7 object-oriented programming system and decided to build a proper project to learn its mechanics. I created `{linkfunctions7}`, a package that implements a framework for link functions entirely using S7.

Since S7 is still relatively new and best practices are still emerging, I would love to get some feedback from developers who have more experience with it or with R package development in general.

Any critiques or suggestions are incredibly welcome. I really want to make sure I am writing actual S7 code rather than just forcing S3/R6 habits into a new syntax.

Thanks in advance for your time!

u/Pool_Imaginary — 1 day ago

How to add a trend line to a specific data series, along with the equation of the line?

hey guys, i’m writing some code to generate a trend line, but when i use this code, the line becomes misaligned and appears to be offset (idk why). i’d also like to know if there’s a command or method for creating a trend line similar to the one shown in excel. i used the next code:

lm_c1<-subset(c1,Time %in% c(2,3,6))

C1<-ggplot(c1,

aes(x=Time, y=LnOP))+geom_smooth(data=lm_c1,method ="lm", se = FALSE, color = "#082E8B", linewidth = 0.8)+geom_line(color="#8B6508")+geom_point(shape=21,fill="#EEAD0E",size=1.5,color="#CD950C",stroke=1.5)+scale_y_continuous(labels = function(x) sprintf("%.3f", x))+scale_x_continuous( breaks = seq(0, 60, by = 2))+theme_few()+theme(axis.title.y=element_text(margin=margin(r=35)), plot.margin = margin(10, 5, 5, 5),text= element_text(family = "Times New Roman"))

reddit.com
u/Ender_MQ — 3 days ago

After I use the command install.packages("vegan") I get the following message:

Warning message:
In utils::install.packages("vegan") :
installation of package ‘vegan’ had non-zero exit status

I've also tried downloading it manually, but no success.

Is the new Rstudio update the issue? I appreciate all the help beforehand.

u/Cerradinho — 11 days ago

Hi everyone,

I have tried finding out what the issue is for quite some time now, but since I am not the most proficient regarding technology, I am struggling finding something applicable.

My goal was to do a simple pie chart with plotly, but for some reason the code that I used previously and copy-pasted, always comes back with the 'unexpected symbol' error. I have tried finding a punctuation error or a misspelling, but nothing. I downloaded the new version of R Studio today and I think I might be missing some packages, but I also don't know which ones they might be and my research did not yield anything. I have installed tidyverse, plotly, dplyr, ggplot2 and readxl.

I used the code from this website and adjusted it for my dataset

https://www.geeksforgeeks.org/r-language/how-to-create-pie-chart-using-plotly-in-r/

'plotly::plot_ly(data=surveyresponses_Freizeit_Bachelor_count,values=~n,labels=~factor(Bleibt genug Zeit für Freizeit?),'

'marker=list(colors=c("green","orange","blue")),'
'type="pie") %>% layout(title="Bleibt genug Zeit für Freizeit im Bachelor?")'

I look forward to any input and thank you all in advance for your help. Maybe it's something really stupid, that I just didn't see...

u/fuckpineapplepizza — 5 days ago

Hi. I heard the company mading RStudio also made Positron. Both are free code editor. My question: how compagnies like this one is making a livibg ? What is their business model when everything seems free ?

thx

reddit.com
u/Top-Vacation4927 — 6 days ago

So I'm doing a research project with data from a recent poll. (Posting on a burner account just in case I'm not supposed to ask)

The news claims Incumbent wins 45% of the vote, challenger wins 38%, 15% undecided.

Removing identifiers in case I'm not allowed to share.

If the election for X from Z were held today, who would you vote for if the

candidates were…

  1. Incumbent
  2. Challenger
  3. Someone else (please specify): _______ [VOL]
  4. Wouldn’t vote [VOL]

Range is 1-4.

Output (summary) =

Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  1.000   1.000   2.000   1.562   2.000   4.000      99 

Output (table)

 1   2   3   4 
318 356   9   4 

Napkin math of actual polling (i.e. Incumbent getting 318, Challenger 356, etc) =

Incumbent - 46.28820961%

Challenger - 51.81950509%

Someone Else - 1.31004367

Wouldn't Vote - 0.58224163

Am I doing something wrong or do I need to email my professor? Lol

reddit.com
u/Clever_Comrade — 14 days ago

Hello everyone !

A colleague of mine is working quite a big dataframe (compared to what we're used to) and asked for my help to get some analysis running.

She's trying to compare the expression of 15 different gene between 4 groups (A,B,C,D), with each group having between 12 and 15 individuals (so something like 800 rows and 4 columns total). Basically, her dataframe looks like that :

Condition Gene Expression
A GENE1
B GENE1
C GENE1
D GENE1
A GENE2
B GENE2
C GENE2
D GENE2
A GENE3
B GENE3
C GENE3
D GENE3

For her analysis, we're going with an ANOVA + TukeyHSD but we were wondering if there was a way to basically loop them so that it would go in the dataframe, group by Gene, then by Condition and apply both tests to the Expression column

My first thought was to go with :

data |&gt;
dplyr::group_by() |&gt;
dplyr::summarise()

But since both aov() and TukeyHSD() output are table/matrices it kind of complicate the whole deal.

My next thought was to use a for loop, but I suck with those

Does anyone know if it's even possible to begin with ?

Thanks in advance

reddit.com
u/Intelligent-Gold-563 — 13 days ago

Hi everyone,

I have a question about how people usually program neural networks and deep learning models in R/RStudio.

Is there a common way to do this without using keras3, since it relies on a Python environment in the background?

For example, do people use pure torch, luz, mlr3torch, or any other R-native packages that do not depend on Python?

Or, in practice, do most people avoid R for this type of work and go directly to Python instead?

I would appreciate any guidance, especially from people who have experience building neural networks in R.

reddit.com
u/Random_Arabic — 11 days ago

Hi everyone,

I wanted to open a discussion about how people here structure their R projects for clinical/research analyses, especially for prospective and retrospective studies.

In my last project I started using the {targets} package (tar_make(), pipelines, dependency tracking, reproducibility, etc) and honestly it was probably the cleanest project architecture I've ever had. It made the workflow much easier to maintain and rerun without manually tracking which scripts depended on others.

With this package, I really liked the idea of treating the analysis as a pipeline rather than a collection of disconnected scripts.

Now I'm curious how other people here organize their projects: Do you have a personal framework/template you reuse? How do you avoid "script spaghetti" as project grow?

Would love to hear how more experienced users structure their workflow and what practices ended up scaling well over time

reddit.com
u/kernel-236 — 7 days ago

Hi- I am doing analysis with some large data sets on R-studio (~67GB in total), while it takes a while (few hours) to load in the data as I have to unzip a data file, I can then work with the data fine. However if I close the session (which happens automatically when I close my laptop), I can't reopen the session it just comes up with an error message. I know 67GB is large but surely people work with much larger files? To note, the server is my university's server not my own but I can log in on my own device. Any help greatly appreciated as I can't spend a few hours every day re-loading my data.

reddit.com
u/Direct-Mention-4124 — 10 days ago

Hey... I need help "^^

I don't know what I'm doinbg wronmg but I surely can't help myself.

My code does not count any "Übrige Sachschadensunfälle", they're just zero.

But it worked with every other kind of acident.
I looked into the data and I found multiple correct combinations.

Can somebody help me? =(

My Code:
library(tidyverse)

su1 <- read.csv(

"C:\\Users\\...\\nutzbar\\46241-0011_de_flat.csv",

sep = ";", stringsAsFactors = FALSE

)

su2 <- read.csv(

"C:\\Users\\...\\nutzbar\\46241-0012_de_flat.csv",

sep = ";", stringsAsFactors = FALSE

)

su1 <- su1 %>%

mutate(unfallart = X5_variable_attribute_label,

altersgruppe = X3_variable_attribute_label,

geschlecht = X2_variable_attribute_label)

su2 <- su2 %>%

mutate(unfallart = X6_variable_attribute_label,

altersgruppe = X4_variable_attribute_label,

geschlecht = X3_variable_attribute_label)

su <- bind_rows(su1, su2)

su <- su %>%

mutate(

value = na_if(value, "-"),

value = na_if(value, "-0"),

value = as.numeric(value)

)

str(su)

summary(su)

schwere <- su$unfallart

schwere_ord <- factor(schwere,

levels = c(

"Unfälle mit Personenschaden",

"Schwerwiegende Unfälle mit Sachschaden i.e.S",

"Sonst. Unfälle unter dem Einfluss berausch. Mittel",

"Übrige Sachschadensunfälle"

),

ordered = TRUE

)

#wenn hauptverursacher

hauptverur <- su$value_variable_label

#ne NA & ne insgesamt

pim <- su$geschlecht

pim <- na_if(pim, "Ohne Angabe")

pim <- na_if(pim, "Insgesamt")

alta <- su$altersgruppe

#defi der altersgruppen

alta_ord <- factor(alta,

levels = c(

"unter 15 Jahre",

"15 bis unter 18 Jahre",

"18 bis unter 21 Jahre",

"21 bis unter 25 Jahre",

"25 bis unter 35 Jahre",

"35 bis unter 45 Jahre",

"45 bis unter 55 Jahre",

"55 bis unter 65 Jahre",

"65 bis unter 75 Jahre",

"75 Jahre und mehr"

),

ordered = TRUE

)

alta_ord[alta_ord == "Alter unbekannt"] <- NA

plo1 <- su %>%

filter(unfallart == "Übrige Sachschadensunfälle") %>%

mutate(alta_ord = factor(altersgruppe,

levels = levels(alta_ord),

ordered = TRUE)) %>%

filter(!is.na(alta_ord)) %>%

group_by(alta_ord) %>%

summarise(anzahl = sum(value, na.rm = TRUE))

ggplot(plo1, aes(x = alta_ord, y = anzahl)) +

geom_point(size = 3) +

# Y-Achse: schöne, gut lesbare Skalierung

scale_y_continuous(

breaks = scales::pretty_breaks(n = 10),

labels = scales::label_number(big.mark = ".", decimal.mark = ",")

) +

# X-Achse: Labels nicht überlappen

theme_minimal() +

theme(

axis.text.x = element_text(angle = 45, hjust = 1) # leicht schräg

) +

labs(

x = "Altersgruppe",

y = "Anzahl Übrige Sachschadensunfälle",

title = "Übrige Sachschadensunfälle nach Altersgruppen"

)

plo2 <- su %>%

mutate(

schwere_ord = factor(unfallart,

levels = c(

"Unfälle mit Personenschaden",

"Schwerwiegende Unfälle mit Sachschaden i.e.S",

"Sonst. Unfälle unter dem Einfluss berausch. Mittel",

"Übrige Sachschadensunfälle"

),

ordered = TRUE),

geschlecht = geschlecht

) %>%

filter(geschlecht %in% c("männlich", "weiblich")) %>%

group_by(schwere_ord, geschlecht) %>%

summarise(anzahl = sum(as.numeric(value), na.rm = TRUE), .groups = "drop")

ggplot(plo2, aes(x = schwere_ord, y = anzahl, fill = geschlecht)) +

geom_col(position = "dodge") +

scale_y_continuous(

labels = scales::label_number(big.mark = ".", decimal.mark = ",")

) +

scale_fill_manual(

values = c("männlich" = "blue", "weiblich" = "red")

) +

theme_minimal() +

theme(

axis.text.x = element_text(angle = 45, hjust = 1)

) +

labs(

x = "Unfallart",

y = "Anzahl",

fill = "Geschlecht",

title = "Unfälle nach Unfallart und Geschlecht"

)

reddit.com
u/pennypaier — 11 days ago

Need to analyze pdf research papers for word frequencies. I'm pretty green when it comes to R studio and have only used it for statistics using an excel file so I'm super confused on how to change the pdf file to a text file for data extraction. I understand that the library(tm) is used for this, but I'm having a hard time finding resources on how to change the document and filter for word frequency with some words being viewed as multi-word units (i.e "climate change" over "climate" and "change").

reddit.com
u/First-Ad-862 — 11 days ago