Data Project

Category:

No matching category found.

Words: 550

Pages: 2

331

Name:
Instructor:
Subject:
Date of submission:
Data Visualisation Data Project
Introduction
Data visualization is an integral skill for researchers. Consequently, scholars should equip themselves with the skill in question because it could aid them to gain useful insights from datasets (Ginde, n.p.). It is notable that R is an open source software that could aid scholars in achieving such an objective. It is why the researcher investigated a dataset that was retrieved from the Michigan State University using R. This paper investigates the Taxi05 data set, which has distance, fare, call, minutes, and payment as the only variables. It is critical that some of the variables are quantitative while other variables are qualitative. Specifically, fare, distance, and minutes are quantitative variables while payment and call are qualitative variables.
It is important to highlight that the researcher sought the help of three packages for examining the data under discussion. These packages include data.table, ggplot2, and lattice. The data.table package was used for reading data directly from the source. Analogously, the lattice and ggplot2 packages aided the researcher in creating the visualization from the data set. For instance, the researcher created a bar graph displaying the mode of payment used in the data set. From the graph in (see figure 1.), it is evident that card payments were slightly higher than cash payments.

Figure SEQ Figure * ARABIC 1. A bar graph displaying the mode of payment for the Taxi05 data set
A histogram for the distance variable was generated (see figure 2.

Wait! Data Project paper is just an example!

). The histogram indicates that distance is skewed to the right and centered close to 2.5.

Figure SEQ Figure * ARABIC 2. A frequency histogram displaying the distribution of distance for the Taxi05 data set
A relative frequency histogram for the distribution of minutes was created with the help of the lattice package. From the graph below, the distribution of minutes is skewed to the left and centered at around ten minutes (see figure 3.).

Figure SEQ Figure * ARABIC 3. A relative frequency histogram for the distribution of minutes in the Taxi05 data set
Summary statistics for the fare in the Taxi05 data set were also examined and results displayed in below (see figure 4.). They indicate that mean = 14.26, median = 11.30, standard deviation =9.01, variance=81.21, range=65, interquartile range = 8.95, first quartile = 8.30, size (n) = 500, and the third quartile = 17.25.

Figure SEQ Figure * ARABIC 4. An image displaying summary statistics for the Taxi05 data set
A box plot of fare was also displayed using the ggplot2 package (see figure 5.). It indicates that fare is positively skewed has significant outliers for any cost above 30.

Figure SEQ Figure * ARABIC 5. A box plot showing the distribution of fare from the Taxi05 data set
A side-by-side boxplot of fare separated by call was also created (see figure 6.). From the figure, it is clear that the average cost of fare for dispatch calls is higher than the cost of fare for Street_Hall calls. It is also clear that Street_Hall is positively skewed with several outliers while Dispatch is approximately normally distributed with few outliers.

Figure SEQ Figure * ARABIC 6. A box plot for the distribution fare grouped y call in the Taxi05 data set
Ultimately, a scatter plot of distance against fare was generated (see figure 7.). The scatterplot indicates that there is a positive linear relationship between distance and the cost of fare.
Figure SEQ Figure * ARABIC 7. A scatter plot of distance and fare from the Taxi05 data set
Conclusion
In conclusion, it is evident that data visualization is important because of the deductions obtained from this data visualization exercise. For instance, the researcher was able to examine the difference in the distribution of fare grouped by calls. The researcher was also able to establish the presence of a positive linear relationship between distance and the cost of fare. Critical to the debate is the code used to produce the reported output, which is displayed in the appendix below. In short, data visualization could aid researchers to clear and concise insights from data sets (Rahlf, pp.1).

Works Cited
Ginde, Gouri. Visualisation of massive data from scholarly Article and Journal Database: A
Novel Scheme. Department of Computer Science and Engineering, PESIT Bangalore South Campus, Bangalore, India. n.d., https://arxiv.org/ftp/arxiv/papers/1611/1611.01152.pdf. Accessed 07 February 2018.
Rahlf, Thomas. “Data Visualisation with R: 100 Examples.” Journal of Statistical Software, vol.
81, no. 2, 2017, pp.1-5.

Appendix
Codes used to generate the output.

Post Views: 331

Subscribe and get the full version of the document name

Use our writing tools and essay examples to get your paper started AND finished.

Henry Butler

5.0 (427 reviews)

Recent reviews about this Writer

If you still have any doubts about AnyCustomWriting.com, just forget about them. I’m the best in my class now because I’ve ordered their editing services one day. The whole team is just awesome.

View profile

Space, scale and languages: identity construction of cross-boundary students in a multilingual university in Hong Kong

Pages: 5

(1375 words)

NC State is a community that is strong because of the diversity of our perspectives and experiences. Please describe how you could contribute to or benefit from campus diversity.’” (word limit-550)

Pages: 2

(550 words)

Importance Of Economy As Positive Science

Pages: 6

(1677 words)

Studying Abroad: Australia, Perth

Pages: 3

(785 words)

CV PERSONAL PROFILE

Pages: 1

(275 words)

Civilliberties

Pages: 1

(275 words)

Admission Essay: Economics

Pages: 1

(550 words)

University Self-Service Website for Mobile Devices

Pages: 11

(3025 words)

Policy Development

Pages: 1

(275 words)

Understanding Diversity: The Importance of Social Acceptance

Pages: 1

(550 words)