26

I have a dataset like the one below. I would like to remove all characters after the character ©. How can I do that in R?

data_clean_phrase <- c("Copyright © The Society of Geomagnetism and Earth", 
"© 2013 Chinese National Committee ")

data_clean_df <- as.data.frame(data_clean_phrase)
Ethan
  • 1,625
  • 8
  • 23
  • 39
Hamideh
  • 920
  • 2
  • 11
  • 22

2 Answers2

30

For instance:

 rs<-c("copyright @ The Society of mo","I want you to meet me @ the coffeshop")
 s<-gsub("@.*","",rs)
 s
 [1] "copyright "             "I want you to meet me "

Or, if you want to keep the @ character:

 s<-gsub("(@).*","\\1",rs)
 s
 [1] "copyright @"             "I want you to meet me @"

EDIT: If what you want is to remove everything from the last @ on you just have to follow this previous example with the appropriate regex. Example:

rs<-c("copyright @ The Society of mo located @ my house","I want you to meet me @ the coffeshop")
s<-gsub("(.*)@.*","\\1",rs)
s
[1] "copyright @ The Society of mo located " "I want you to meet me "

Given the matching we are looking for, both sub and gsub will give you the same answer.

MASL
  • 501
  • 5
  • 8
  • Thank you. and What if I want to do it for the last © in the text. Consider this: c(" © aaa © bbb") --> c( "© aaa") – Hamideh Nov 19 '15 at 15:36
  • @HamidehIraj You can make use of [regexes](http://stackoverflow.com/questions/7449564/regex-return-all-before-the-second-occurrence) for executing that. – Dawny33 Nov 19 '15 at 16:26
  • 1
    You are welcome. Once you get use to regex you'll see that it is as easy to remove from the last @ char. I've edited my answer to include this case as well. – MASL Nov 19 '15 at 17:27
0

For the sake of completeness: You could use the stringr package to extract what you want.

library(stringr)
data_clean_phrase <- c("Copyright © The Society of Geomagnetism and Earth", 
                       "© 2013 Chinese National Committee ")

str_extract(data_clean_phrase, "^(.*?©)") # including the @
str_extract(data_clean_phrase, "^.*(?=(©))") # excluding the @

Note: I chose to str_extract, you could also choose to str_remove.

ToWii
  • 101
  • 1