

The Tidyverse is a good fit for a lot of data analysis tasks, and it plays a big role in the continuing popularity of R.Īnother important part of the R ecosystem is the development environment RStudio. These are designed to allow users to ingest data in whatever form in which it is received (e.g., from a CSV, an API, etc.) and easily transform it into the shape needed to analyze it, using a declarative “grammar” of data manipulation. One of the most popular sets of packages in the R ecosystem is the Tidyverse, a collection of libraries for transforming and using “tidy” data. There are also high-quality free online books and other documentation explaining how to use R effectively. Its large user community has developed thousands of freely available packages, including packages for data manipulation, data visualization, specialized statistical estimation procedures, machine learning, accessing public data APIs like US Census data or Spotify, easily making data-based web apps, and many, many other areas. R is a programming language popular with statisticians, scientists, and data analysts. In part two, “Using R with Amazon Web Services for document analysis”, we’ll take a deeper dive into building an end-to-end document processing application with AWS services. In this two-part series, part one will cover the basics of R and common workload pairings for R on AWS.

This article is a guest post from David Kretch, Lead Data Scientist at Summit Consulting.Īs R workloads grow and become increasingly resource intensive, the ability to move from a local compute environment to scaleable, fully managed cloud services on Amazon Web Services (AWS) becomes extremely valuable for cost, speed, and resiliency reasons.
