Data Analysis is increasingly gaining popularity, and the question of how to perform data analytics using R is also becoming important, due to the importance of R as a tool that enables data analysts to perform data analysis and visualization. An important term coined in Data Analysis using R is exploratory data analysis; it is an approach of data analysis employed for summarizing and visualizing data set, the concept is designed by John Tukey, and the focus of the approach is to analyse data’s basic structures and variables to develop a basic understanding of the data set, in order to develop an in-depth understanding of the data’s origin and to investigate what methods of statistical analysis would be appropriate for data analysis. In order to understand the usage of R and to discuss the software R, employed for data analysis, and then describe how and why R can be employed to analyse data effectively, The Bhawanipur Education Society College organised an interactive session in association with International Skill Development Corporation and Institute of Analytics (IoA), London on “Data Analysis Through R” from 3:30pm onwards on Tuesday, October 6, 2020. The speaker for the session was Dr.Vinod Kumar Murti who is at present associated with IoA (Institute of Analytics), Head Quarter at London in the capacity of Country Head – India based at Bangalore.
Dr.Vinod Kumar Murti is an Industry turn academician who has a vast experience of 17 years in Engineering Industry and 19 years in academics. Dr Vinod holds an Engineering Degree (B.E.) in Mechanical discipline, an MBA in Finance and Marketing and PhD in Finance. Dr.Vinod is passionate about Data Analytics and is currently engaged in writing a book on Multivariate Data Analysis. IBM has authorized him as a Certified Trainer for IBM-CEBT (Career Education for Business Transformation) in the area of Predictive Analysis. He has been imparting Corporate Training on Data Analysis with companies like Accenture, Capgemini, Prudential Global, Hewlett-Packard, Goldman Sachs, ANZ BANK. His research interest lies in developing ‘Corporate Bankruptcy Prediction’ models using advance Data Mining tools like Neural Networks, Adaptive Boosting, Random Forests, Fuzzy Logics, Support Vector Machines and Genetic Algorithms.
The interactive session began at sharp 3:30pm and lasted till 7pm and had around 110 active participants throughout the session. The session began with introduction of the rules and protocols to follow to the participants, followed by a introductory speech by Prof.Dilip Shah, Dean of Student Affairs, BESC wherein he presented his views on the new age technology and wished the participants a fruitful session, after which the speaker was introduced by the student co-ordinators.
Post-Introductions, the floor was taken by Mr. Daya Murthy, Head of Institutional Partnership, International Skill Development Corporation(ISDC), who spoke on why Data Analytics is important and how is it done. The next section of the session included the introduction of R by Dr.Vinod Kumar Murti.
Data analysis using R is increasing the efficiency in data analysis, because data analytics using R, enables analysts to process data sets that are traditionally considered large data-sets, e.g. previously it was not possible to process data sets of 500,000 cases together, but with R, on a machine with at least 2GB of memory, data sets off 500,000 cases and around 100 variables can be processed. Before developing an in-depth understanding of what exactly data analytics using R contains it was important to understand the basic interface of R. The R software has four basic features, R Console, R Script, R environment and Graphical output. If all of these features are summarized R has the ability to enable analysts to write codes in console, then run commands through script, analyze variables and sets in R environment and then present the data in the form of graphical output. In simple 4 steps, users can analyze data using R, by performing following tasks:
R-Console: Using R console, analysts can write codes for running the data, and also view the output codes later, the codes can be written using R Script.
R-Script: R script is the interface where analysts can write codes, the process is quite simple, users just have to write the codes and then to run the codes they just need to press Ctrl+ Enter, or use the “Run” button on top of R Script.
R Environment: R environment is the space to add external factors, this involves adding the actual data set, then adding variables, vectors and functions to run the data. You can add all your data here and then also view whether your data has been loaded accurately in the environment.
Graphical Output: Once all the scripts and codes are added and data sets and variables are added to R, graphical output feature could be used to create graphs after the exploratory data analysis is performed.
The speaker in a very friendly way introduced the interface of R before moving ahead on to how R can be used and the types of data sets that the R Data Analytics can analyse efficiently. Next, with the help of CSV file that was already provided to the registrants beforehand, the speaker taught the participants how to import data files. Following the import function, the speaker introduced the participants to how can one perform statistical analysis and the functions that help in performing them. He further even explained how to change the data sets using R for convenience in analysis. He also explained on how to visualise data sets and explained what histograms, bar diagram and heat maps are and taught the participants how to create them. Post introduction of the diagrams and tools to visualise data, the speaker spoke on what descriptive statistics is and how do we use them and what they mean before explaining how to carry them out in R software.
The best part of the session was that it was completely hands on and each and every query was answered by the speaker. The session was highly interactive with questions being continuously asked by the participants and the speaker further asking them questions in between to make sure that each and every concept and function is clear.
The session ended with the speaker praising the participants for their enthusiasm and expressed his observation that the he never faced a crowd who learnt the R software so quickly. Following the presentation of his views, session came to the final end with the vote of thanks and the participants expressing their gratitude and views before the launching of the quiz that was kept at the end to test the knowledge gained in the workshop by the participants.
The student co-ordinators of the session were Kashish Burman and Shreyans Jaiswal. All the participants were already given assistance with the installation of R software and R studio a day before by the assisting team which consisted of Arpana Gupta, Chahna Rungta, Vanshika Jain, Sakshi Luhariwala, Anshu Mimani and Yashvi Doshi.