2

I am new to R. While working on my university assignments, I found that legends for Base R plot do not show correct information, hence I switched to ggplot2 wherever legends were needed.

I observed although Base R color code the data (example differentiated by CLASS as was required in our assignment) but legend failed to show right CLASS with respect to color scheme i.e. In graph if Cyan is actually representative CLASS A5 (given the position of points), legend will show something else say Cyan as CLASS A3. There's no way to know it's wrong, until you try same with ggplot2 and find the differences.

Same error never occurs with ggplot2. I have attached both results and code for comparative analysis.

I used below code for Base R:

#A scatter-plot of SHUCK versus VOLUME differentiated by CLASS
plot(y=mydata$SHUCK,x=mydata$VOLUME,main = "SHUCK versus VOLUME (differentiated by CLASS)",col=mydata$CLASS, xlab = 'Volume',ylab = 'Shuck', pch=16)
# Add a legend
legend("topleft", legend=levels(mydata$CLASS), pch=16, col=unique(mydata$CLASS))

enter image description here

If I run similar code using ggplot2, I get legend showing different result. I used below code for ggplot.

x <- ggplot(mydata, aes(VOLUME, SHUCK)) + theme_bw()
x + geom_point(aes(fill = CLASS), shape = 23, alpha = 0.75)

enter image description here

To clarify further, if we check images for Base-R and ggplot with legends, it seems Class A5 in pink for ggplot is represented by Class A3 in cyan for Base R which is wrong

I know I am doing something wrong when I use Base R. How should I add legend in Base R such that legend is in sync with order of color-coded representation in graph to maintain accuracy of representation of actual class of data-points in case of categorical data?

Has anyone experienced same? Any guidance will be helpful. Thanks

Vanjuli
  • 21
  • 4
  • This is really a very weird error. I have never encountered anything like this. Is this happening time and again when you're re-running the code? – Shibaprasadb Nov 29 '21 at 04:59
  • @Shibaprasadb yes – Vanjuli Nov 29 '21 at 07:55
  • 1
    The color in the base plot is done in order of the data, not the grouping of your data. Try ordering your data on CLASS and create a factor of CLASS before plotting. That should help. Otherwise, add a dput of your data to the question. – phiver Nov 29 '21 at 10:54
  • @phiver I tried ordering data by CLASS column and got correct legend for plot this time! However, CLASS was already of type FACTOR, I just ordered data. Thanks for the advice, it worked! – Vanjuli Dec 01 '21 at 06:58
  • The code I added to order the data by column CLASS of type FACTOR: `ordered_data <- mydata[order(mydata$CLASS),]` – Vanjuli Dec 01 '21 at 07:00

0 Answers0