Sparklyr Cheat Sheet



Introduction to Sparklyr for Data Science

  1. Sparklyr Cheat Sheets
  2. Sparklyr Cheat Sheet Fortnite
  3. Sparklyr Cheat Sheet Printable

Dplyr provides a grammar for manipulating tables in R. This cheat sheet will guide you through the grammar, reminding you how to select, filter, arrange, mutate, summarise, group, and join data frames and tibbles. Updated January 2017. Weekly R-Spatial Cheat Sheet (due by 11:59 pm) 4 EAS 543-FALL 2019 corderos@umich.edu. Spark with Sparklyr, 14) Tidy evaluation with rlang, 15) caret package.

Publisher:InfiniteSkills

Sparklyr Cheat Sheets

Duration:01:41:45

Join data scientist Kelly O'Briant for an exploration of sparklyr, the package from RStudio which provides an interface to Apache Spark from R. For many data scientists who rely ... - Selection from Introduction to Sparklyr for Data Science [Video]
Release Date: September 2017
ISBN: 9781491996508
Video Description
Join data scientist Kelly O'Briant for an exploration of sparklyr, the package from RStudio which provides an interface to Apache Spark from R. For many data scientists who rely on R for their work, the paradigm shift from local in-memory computations to scalable distributed data processing can be complicated to navigate. This course provides an easy-to-follow R based method for working with big data. You'll connect to Spark, run some sparklyr code, and explore some practical applications of Spark SQL and sparklyr functionality. You'll wrap up by performing some exploratory analysis and feature generation using a Kaggle competition data set. Learners should have a moderate level of experience with doing data science tasks or workflows in R. Explore the benefits and limitations of choosing sparklyr for distributed computing in R Discover how to interact with data in Apache Spark through sparklyr and Spark SQL Understand how to connect to Spark locally or to a remote Spark cluster Learn to perform exploratory data analysis in Spark using sparklyr, dplyr, and DBI Master the differences between working with data frames in R versus Spark Understand how to build data products in R that don't rely on storing big data locallyKelly O'Briant is a data scientist and lead R developer with Washington DC based B23 LLC. She holds degrees in Computational Science and Informatics from George Mason University, and Bioinformatics from Virginia Commonwealth University. Kelly is a founder and co-organizer of the Washington DC chapter of R-Ladies Global. She gives talks on R cloud computing, R data products, and sparklyr at R-Ladies meetups and R conferences.
Introduction
Welcome To The Course
00:03:17
About The Author
00:02:16
Prerequisites And Getting Started
Introduction To Spark And Sparklyr
00:06:43
Sparklyr Deployment Options
00:01:45
Running Spark And R In The Cloud
00:03:43
Sparklyr Livy Connections
00:04:33
Getting Acquainted: Spark And R In The Context Of Data And Data Structures
Set Up RStudio And Connect To Spark
00:05:37
Spark Data Tables And R Data References
00:02:56
Sparklyr Cheat Sheet Walk Through
00:06:54
Sparklyr And SparkSQL
How Sparklyr Works: Dplyr Basics Part - 1
00:06:50
How Sparklyr Works: Dplyr Basics Part - 2
00:08:05
How Sparklyr Works: Dplyr Basics Part - 3
00:06:37
Lazy Execution
00:06:53
Programming In Dplyr
00:05:42
Extending Sparklyr With Replyr
00:04:10
Hands-On Analysis Project
Hands-On Analysis Project
00:02:42
Exploratory Analysis With Sparklyr
00:06:56
ML Feature Generation Part - 1
00:07:18
ML Feature Generation Part - 2
00:06:10
Conclusion
Wrap Up And Thank You
00:02:38

پیشنهاد آموزش مرتبط در فرادرس

لینک های دانلود حجم فایل: 510.0MBOreilly Introduction to Sparklyr for Data Science_git.ir.rar

Inspired by R and its community

The RStudio team contributes code to many R packages and projects. R users are doing some of the most innovative and important work in science, education, and industry. It’s a daily inspiration and challenge to keep up with the community and all it is accomplishing.

Managing Packages

If keeping up with the growing number of packages you use is challenging, consider RStudio Package Manager.

Analyse & Explore

The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying philosophy and common APIs.

Project Site Link

ggplot 2 is an enhanced data visualization package for R. Create stunning multi-layered graphics with ease.

Project Site Link Sparklyr cheat sheet pdfCheat

dplyr is the next iteration of plyr, focussing on only data frames. dplyr is faster and has a more consistent API.

Project GitHub Link

tidyr makes it easy to “tidy” your data. Tidy data is data that’s easy to work with: it’s easy to munge (with dplyr), visualise (with ggplot2 or ggvis) and model (with R’s hundreds of modelling packages).

Project Paper Link

purrr enhances R’s functional programming (FP) toolkit by providing a complete and consistent set of tools for working with functions and vectors.

Project Site Link

A consistent, simple and easy-to-use set of wrappers around the fantastic 'stringi' package.

Project Site Link

Communicate & Interact

Shiny makes it incredibly easy to build interactive web applications with R. Shiny has automatic “reactive” binding between inputs and outputs and extensive pre-built widgets.

Sheet Project Site Link

rmarkdown lets you insert R code into a markdown document. R then generates a final document, in a wide variety of formats, that replaces the R code with its results.

Project Site Link

Use flexdashboard to publish groups of related data visualizations as a dashboard.

Project Site Link

Model & Predict

TensorFlow™ is an open-source software library for Machine Intelligence. The R interface to TensorFlow lets you work productively using the high-level Keras and Estimator APIs and the core TensorFlow API.

Project Site Link

The tidymodels framework is a collection of packages for modeling and machine learning using tidyverse principles.

Project Site Link

Sparklyr provides bindings to Spark’s distributed machine learning library. Together with sparklyr’s dplyr interface, you can easily create and tune machine learning workflows on Spark, orchestrated entirely within R.

Project Site Link

Sparklyr Cheat Sheet Fortnite

Connect & Integrate

Sparklyr is an R interface to Apache Spark, a fast and general engine for big data processing. This package connects to local and remote Apache Spark clusters, a ‘dplyr’ compatible back-end, and an interface to Spark’s ML algorithms.

Project Site Link

Plumber enables you to convert your existing R code into web APIs by merely adding a couple of special comments.

Project Site Link

The reticulate package provides a comprehensive set of tools for interoperability between Python and R.

Project Site Link

Additional Resources

Ursa Labs is an industry-funded development group specializing in open source data science tools. It is dedicated to advancing the state of the art in high-productivity, high-performance, cross-language software for data scientists.

Sparklyr Cheat Sheet Printable

Project Site Link

Databases using R

Project Site Link



Comments are closed.