- A univariate T-test (or 1 Sample T-test) is a type of hypothesis test that compares a sample mean to a hypothetical population mean and determines the probability that the sample came from a distribution with the desired mean. This can be performed in Python using the ttest1samp function of the SciPy library. The code block shows how to call ttest1samp.
- NumPy/SciPy Cheat Sheet. This cheat sheet is a quick reference for NumPy / SciPy beginners and gives an overview about the most important commands and functions of NumPy and SciPy that you might need on solving the exercise sheets about Linear Algebra in Information Retrieval. It doesn't claim to be complete and will be extended continuously.
Over the past few years, as the buzz and apparently the demand for data scientists has continued to grow, people are eager to learn how to join, learn, advance and thrive in this seemingly lucrative profession. As someone who writes on analytics and occasionally teaches it, I am often asked - How do I become a data scientist?
Adding to the complexity of my answer is data science seems to be a multi-disciplinary field, while the university departments of statistics, computer science and management deal with data quite differently.
But to cut the marketing created jargon aside, a data scientist is simply a person who can write code in a few languages (primarily R, Python and SQL) for data querying, manipulation , aggregation, and visualization using enough statistical knowledge to give back actionable insights to the business for making decisions.
In fact, we must understand linear algebra to go there. SciPy is linear algebra library in Python. If you want to learn deep learning for example (i.e., image classification), you will deal with large matrix from your image and you need to do many operation on your matrix. That’s why we need SciPy. Here the cheat sheet of SciPy library in Python. We’ve collated a collection of cheat sheets for you to get to grips with the main libraries used in data science. They are grouped into the fields for which each library is designed: Basics, Databases, Data Manipulation, Data Visualization, Analysis, Machine Learning, Deep Learning and Natural Language Processing (NLP).
Since this rather practical definition of a data scientist is reinforced by the accompanying words on a job website for “data scientists” , ergo, here are some tools for learning the primary languages in data science- Python, R and SQL. A cheat sheet or reference card is a compilation of mostly used commands to help you learn that language’s syntax at a faster rate.
The inclusion of SQL may lead to some to feel surprised (isn’t this the NoSQL era?) , but it is there for a logical reason. Both PIG and Hive Query Language are closely associated with SQL- the original Structured Query Language. In addition one can solely use the sqldf package within R (and the less widely used python-sql or python-sqlparse libraries for Pythonic data scientists) or even the Proc SQL commands within the old champion language SAS, and do most of what a data scientist is expected to do (at least in data munging).
For Python, this is a rather partial list given the fact that Python, the most general purpose language within the data scientist quiver, can be used for many things. But for the data scientist, the packages of numpy, scipy , pandas and scikit-learn seem the most pertinent.
Do all the thousands of R packages have useful interest to the aspiring data scientist? No.
Accordingly we chose the appropriate cheat sheets for you. Note that this is a curated list of lists. If there is anything that can be assumed in the field of data science, it should be that the null hypothesis is that the data scientist is intelligent enough to make his own decisions based on data and it’s context. 3 printouts is all it takes to speed up the aspiring data scientist’s journey.
Numpy Cheat Sheet 2020
Please add additional cheat sheets in comments below.
Cheat Sheets for Python
- Python www.astro.up.pt/~sousasag/Python_For_Astronomers/Python_qr.pdf
- NumPy, SciPy and Pandas s3.amazonaws.com/quandl-static-content/Documents/Quandl+-+Pandas,+SciPy,+NumPy+Cheat+Sheet.pdf
Cheat Sheets for R
- Short Reference Card cran.r-project.org/doc/contrib/Short-refcard.pdf
- R Functions for Regression Analysis cran.r-project.org/doc/contrib/Ricci-refcard-regression.pdf
- Time Series cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdf
- Data Mining cran.r-project.org/doc/contrib/YanchangZhao-refcard-data-mining.pdf
- Quandl s3.amazonaws.com/quandl-static-content/Documents/Quandl+-+R+Cheat+Sheet.pdf
Cross Reference between R, Python (and Matlab)
Cheat Sheets for SQL
- SQL Joins www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins
- SQL and Hive hortonworks.com/wp-content/uploads/downloads/2013/08/Hortonworks.CheatSheet.SQLtoHive.pdf
Additional
- Cheat Sheets for Java introcs.cs.princeton.edu/java/11cheatsheet/
- Linux Cheat Sheet www.linuxstall.com/linux-command-line-tips-that-every-linux-user-should-know/
Comments are closed.