Introduction to Sparklyr for Data Science
Dplyr provides a grammar for manipulating tables in R. This cheat sheet will guide you through the grammar, reminding you how to select, filter, arrange, mutate, summarise, group, and join data frames and tibbles. Updated January 2017. Weekly R-Spatial Cheat Sheet (due by 11:59 pm) 4 EAS 543-FALL 2019 corderos@umich.edu. Spark with Sparklyr, 14) Tidy evaluation with rlang, 15) caret package.
Publisher:InfiniteSkillsSparklyr Cheat Sheets
Duration:01:41:45Join data scientist Kelly O'Briant for an exploration of sparklyr, the package from RStudio which provides an interface to Apache Spark from R. For many data scientists who rely ... - Selection from Introduction to Sparklyr for Data Science [Video]
Release Date: September 2017
ISBN: 9781491996508
Video Description
Join data scientist Kelly O'Briant for an exploration of sparklyr, the package from RStudio which provides an interface to Apache Spark from R. For many data scientists who rely on R for their work, the paradigm shift from local in-memory computations to scalable distributed data processing can be complicated to navigate. This course provides an easy-to-follow R based method for working with big data. You'll connect to Spark, run some sparklyr code, and explore some practical applications of Spark SQL and sparklyr functionality. You'll wrap up by performing some exploratory analysis and feature generation using a Kaggle competition data set. Learners should have a moderate level of experience with doing data science tasks or workflows in R. Explore the benefits and limitations of choosing sparklyr for distributed computing in R Discover how to interact with data in Apache Spark through sparklyr and Spark SQL Understand how to connect to Spark locally or to a remote Spark cluster Learn to perform exploratory data analysis in Spark using sparklyr, dplyr, and DBI Master the differences between working with data frames in R versus Spark Understand how to build data products in R that don't rely on storing big data locallyKelly O'Briant is a data scientist and lead R developer with Washington DC based B23 LLC. She holds degrees in Computational Science and Informatics from George Mason University, and Bioinformatics from Virginia Commonwealth University. Kelly is a founder and co-organizer of the Washington DC chapter of R-Ladies Global. She gives talks on R cloud computing, R data products, and sparklyr at R-Ladies meetups and R conferences.
Introduction
Welcome To The Course
00:03:17
About The Author
00:02:16
Prerequisites And Getting Started
Introduction To Spark And Sparklyr
00:06:43
Sparklyr Deployment Options
00:01:45
Running Spark And R In The Cloud
00:03:43
Sparklyr Livy Connections
00:04:33
Getting Acquainted: Spark And R In The Context Of Data And Data Structures
Set Up RStudio And Connect To Spark
00:05:37
Spark Data Tables And R Data References
00:02:56
Sparklyr Cheat Sheet Walk Through
00:06:54
Sparklyr And SparkSQL
How Sparklyr Works: Dplyr Basics Part - 1
00:06:50
How Sparklyr Works: Dplyr Basics Part - 2
00:08:05
How Sparklyr Works: Dplyr Basics Part - 3
00:06:37
Lazy Execution
00:06:53
Programming In Dplyr
00:05:42
Extending Sparklyr With Replyr
00:04:10
Hands-On Analysis Project
Hands-On Analysis Project
00:02:42
Exploratory Analysis With Sparklyr
00:06:56
ML Feature Generation Part - 1
00:07:18
ML Feature Generation Part - 2
00:06:10
Conclusion
Wrap Up And Thank You
00:02:38
پیشنهاد آموزش مرتبط در فرادرس
Inspired by R and its community
The RStudio team contributes code to many R packages and projects. R users are doing some of the most innovative and important work in science, education, and industry. It’s a daily inspiration and challenge to keep up with the community and all it is accomplishing.
Managing Packages
If keeping up with the growing number of packages you use is challenging, consider RStudio Package Manager.
Analyse & Explore
The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying philosophy and common APIs.
Project Site Linkggplot 2 is an enhanced data visualization package for R. Create stunning multi-layered graphics with ease.
Project Site Linkdplyr is the next iteration of plyr, focussing on only data frames. dplyr is faster and has a more consistent API.
Project GitHub Linktidyr makes it easy to “tidy” your data. Tidy data is data that’s easy to work with: it’s easy to munge (with dplyr), visualise (with ggplot2 or ggvis) and model (with R’s hundreds of modelling packages).
Project Paper Linkpurrr enhances R’s functional programming (FP) toolkit by providing a complete and consistent set of tools for working with functions and vectors.
Project Site LinkA consistent, simple and easy-to-use set of wrappers around the fantastic 'stringi' package.
Project Site LinkCommunicate & Interact
Shiny makes it incredibly easy to build interactive web applications with R. Shiny has automatic “reactive” binding between inputs and outputs and extensive pre-built widgets.
Project Site Linkrmarkdown lets you insert R code into a markdown document. R then generates a final document, in a wide variety of formats, that replaces the R code with its results.
Project Site LinkUse flexdashboard to publish groups of related data visualizations as a dashboard.
Project Site LinkModel & Predict
TensorFlow™ is an open-source software library for Machine Intelligence. The R interface to TensorFlow lets you work productively using the high-level Keras and Estimator APIs and the core TensorFlow API.
Project Site LinkThe tidymodels framework is a collection of packages for modeling and machine learning using tidyverse principles.
Project Site LinkSparklyr provides bindings to Spark’s distributed machine learning library. Together with sparklyr’s dplyr interface, you can easily create and tune machine learning workflows on Spark, orchestrated entirely within R.
Project Site LinkSparklyr Cheat Sheet Fortnite
Connect & Integrate
Sparklyr is an R interface to Apache Spark, a fast and general engine for big data processing. This package connects to local and remote Apache Spark clusters, a ‘dplyr’ compatible back-end, and an interface to Spark’s ML algorithms.
Project Site LinkPlumber enables you to convert your existing R code into web APIs by merely adding a couple of special comments.
Project Site LinkThe reticulate package provides a comprehensive set of tools for interoperability between Python and R.
Project Site LinkAdditional Resources
Ursa Labs is an industry-funded development group specializing in open source data science tools. It is dedicated to advancing the state of the art in high-productivity, high-performance, cross-language software for data scientists.
Sparklyr Cheat Sheet Printable
Project Site LinkDatabases using R
Project Site Link
Comments are closed.