Substring() function in R is widely used to either extract the characters present in the data or to manipulate the data. You can easily extract the required characters from a string and also replace the values in a string.
Hello folks, hope you are doing good. Today let’s focus on the substing function in R.
Substring: We can perform multiple things like extracting of values, replacement of values and more. For this we use functions like substr() and substring().
substr(x,start,stop)
substring(x,first,last=1000000L)
Where:
Well, I hope that you are pretty much clear about the syntax. Now, let’s extract some characters from the string using our substring() function in R.
#returns the characters from 1,11
df<-("Journal_dev_private_limited")
substring(df,1,11)
Output = “Journal_dev”
#returns the characters from 1-7
df<-("Journal_dev")
substring(df,1,7)
Output = “Journal”
Congratulations, you just extracted the data from the given string. As you can observe, the substring() function in R takes the start/first and last/end values as arguments and indexes the string and returns a required substring of mentioned dimensions.
With the help of substring() function, you can also replace the values in the string with your desired values. Seems to be interesting right? Then Let’s see how it works.
#returns the string by replacing the _ by space
df<-("We are_developers")
substring(df,7,7)=" "
df
Output = “We are developers”
#string replacement
df<-("R=is a language made for statistical analysis")
substring(df,2,2)=" "
df
Output = “R is a language made for statistical analysis”
Great, you did it! In this way, you can replace the values in a string with your desired value.
In the above case, you have replaced the ‘_’ (underscore) and “=” (equal sign) with a " " (space). I hope you got it better.
Till now, everything is good! But what if you are required to replace some values, which should reflect in all the strings present?
Don’t worry! We can replace the values and can make them to reflect on all the strings present.
Let’s see how it works!
#replaces the 4th letter of each string by $
df<-c("Alok","Joseph","Hayato","Kelly","Paloma","Moca")
substring(df,4,4)<-c("$")
df
Output = “Alo$” “Jos$ph” “Hay$to” “Kel$y” “Pal$ma” “Moc$”
Oh, What happened? Every 4th letter in the strings has replaced by ‘$’ sign!.
Well, that is substring() for you. It can replace the marked positions with our given value.
In the above case, every 4th letter in all the input strings was replaced by the ‘$’ sign by the substring() function. It’s incredible right? I say Yes. What about you?
We’ve already focused on rows. Now, we will be looking into the extraction of characters in the columns as well.
Let’s see how it works!.
We can create a data frame with sample data having 2 columns namely Technologies and popularity. Let’s extract some specific characters out of this data. It will be fun.
#creates the data frame
df<-data.frame(Technologies=c("Datascience","machinelearning","Deeplearning","Artificalintelligence"),Popularity=c("70%","85%","90%","95%"))
df
Technologies Popularity
1 Datascience 70%
2 machinelearning 85%
3 Deeplearning 90%
4 Artificalintelligence 95%
Yes, we have now created a data frame. Let’s extract some text. To do so, run the below code to extract characters from 8-10 in all the strings in Technologies column using substr() function in R.
#creates new column with extracted values
df$Extracted_Technologies=substr(df$Technologies,8,10)
df
Output =
Technologies Popularity Extracted_Technologies
1 Datascience_DS 70% enc
2 machinelearning_ML 85% lea
3 Deeplearning_DL 90% rni
4 Artificalintelligence_AI 95% ali
Now, you can see that we have created a new column with extracted data. Like this, you can extract the data by specifying the index values.
We saw the substr() function in action. Now, as I mentioned before, we will be looking into the str_sub() function and its way of extraction.
Let’s roll!
Again we are going to create the same data frame including the data of Technologies and its popularity as well.
df<-data.frame(Technologies=c("Datascience","machinelearning","Deeplearning","Artificalintelligence"),Popularity=c("70%","85%","90%","95%"))
df
Technologies Popularity
1 Datascience 70%
2 machinelearning 85%
3 Deeplearning 90%
4 Artificalintelligence 95%
Well, let’s make use of the str_sub() function, which will return the indexed characters as output. Taking/generating a substring in R can be done in many ways and this is one of them.
#using the str_sub function
df$Extracted_Technologies=str_sub(df$Technologies,10,15)
> df
As you can see that the str_sub() function extracted the indexed values and returns the output as shown below.
Technologies Popularity Extracted_Technologies
1 Datascience 70% ce
2 machinelearning 85% arning
3 Deeplearning 90% ing
4 Artificalintelligence 95% intell
Yes, taking or generating a substring of the given string is quite an easier task. Thanks to functions like substr(), substring(), and str_sub() which made sub stringing interesting and exciting.
That’s all for now. Don’t forget to make use of this amazing function in your computation. Happy sub-stringing!!!
More study: R documentation
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.
Sign up for Infrastructure as a Newsletter.
Working on improving health and education, reducing inequality, and spurring economic growth? We'd like to help.
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.