← All Courses
Learn Data Science
Coming SoonAnalyze, visualize, and model real-world data
Data Science Pandas NumPy Jupyter Visualization
Coming Soon
Turn raw data into actionable insights. Data science combines statistics, programming, and domain knowledge to extract meaning from datasets. Pandas for manipulation, Matplotlib and Seaborn for visualization, and scikit-learn for modeling.
This course covers the full data science workflow from data cleaning through exploratory analysis to building predictive models.
Start Here — Learning Roadmap
A suggested path from zero to mastery. Follow these steps in order:
- Learn Python basics — Variables, loops, functions, and data structures (lists, dicts) are prerequisites for data science
- Set up Jupyter notebooks — Install Anaconda or JupyterLab to get an interactive environment for data exploration
- Master NumPy arrays — Understand array operations, broadcasting, vectorization, and why NumPy is faster than Python loops
- Wrangle data with Pandas — Load CSVs, filter rows, group and aggregate, handle missing values, and merge DataFrames
- Visualize with Matplotlib and Seaborn — Create line charts, bar plots, histograms, heatmaps, and scatter plots to communicate findings
- Learn statistics fundamentals — Understand distributions, hypothesis testing, p-values, correlation, and confidence intervals
- Build ML models with scikit-learn — Train linear regression, decision trees, random forests, and evaluate with cross-validation
- Handle real-world data problems — Deal with missing data, outliers, imbalanced classes, feature engineering, and data leakage
- Communicate results — Create compelling data stories with clear visualizations, dashboards, and written analysis
- Scale to production — Use Polars for large datasets, SQL for database queries, and learn MLOps basics for model deployment
Official & Core Documentation
- Pandas Documentation — Data manipulation library with getting started tutorials and API reference (All levels)
- NumPy Documentation — Numerical computing fundamentals, array operations, and linear algebra (Beginner)
- scikit-learn User Guide — Machine learning algorithms with theory, examples, and parameter tuning (Intermediate)
- Matplotlib Documentation — Plotting and visualization reference with gallery of examples (Beginner)
- Seaborn Documentation — Statistical data visualization built on Matplotlib with elegant defaults (Beginner)
- Jupyter Documentation — Interactive notebook environment for data exploration and presentation (Beginner)
- Polars Documentation — Fast DataFrame library as a modern Pandas alternative for large datasets (Intermediate)
- SciPy Documentation — Scientific computing library for optimization, statistics, and signal processing (Intermediate)
- AI & Data Scientist Roadmap — Visual step-by-step guide to the data science learning path (Beginner)
GitHub Awesome Lists & Curated Collections
- awesome-datascience — Comprehensive data science repository to learn and apply for real-world problems (24k+ stars)
- awesome-python-data-science — Curated list of Python libraries for data science organized by category
- awesome-data-analysis — 500+ curated resources including Python, SQL, statistics, ML, and visualization
- datascience (r0f1) — Curated list of Python resources for data science including libraries and tutorials
- awesome-learn-datascience — Resources to help you get started with data science from scratch
- Data-Science-Roadmap — Complete data science roadmap from A to Z with learning resources
Interactive Courses & Hands-On Platforms
Free Courses
- Kaggle Learn — Free micro-courses on Pandas, SQL, visualization, and ML with hands-on notebooks (Beginner)
- freeCodeCamp — Data Analysis with Python — Free certification course with 5 real projects (Beginner)
- Harvard CS109 — Data Science — Full Harvard data science course materials available free online (Intermediate)
- MIT OpenCourseWare — Data Science — Free MIT course materials on statistics and data analysis (Intermediate)
University & MOOC Courses
- Google Data Analytics Certificate — Professional certificate on Coursera covering the full analytics workflow (Beginner)
- IBM Data Science Professional Certificate — Comprehensive program covering the full data science workflow (Beginner)
- DataCamp — Interactive data science courses with free introductory tier and browser-based coding (Beginner)
- Johns Hopkins Data Science Specialization — 10-course specialization covering R and statistics (Intermediate)
Practice & Challenges
- Kaggle Competitions — Real ML challenges with datasets, leaderboards, and prize money (All levels)
- DrivenData — Data science competitions for social good with real-world impact (Intermediate)
- StrataScratch — SQL and Python interview questions sourced from real company interviews (Intermediate)
Video Courses & YouTube Channels
Structured Course Playlists
- 3Blue1Brown — Essence of Linear Algebra — Visual math foundations every data scientist needs (Beginner)
- Corey Schafer — Pandas — Practical Pandas tutorials for real data tasks (Beginner)
- freeCodeCamp — Data Science Full Course — Complete data science beginner course covering Python, stats, and ML (Beginner)
Individual Creators & Channels
- StatQuest with Josh Starmer — Statistics and ML concepts explained clearly with animations (All levels)
- Krish Naik — ML, deep learning, and data science with hands-on tutorials (Intermediate)
- Ken Jee — Data science projects, Kaggle walkthroughs, and career advice (Beginner)
- codebasics — Data analytics and data science through practical project-based tutorials (Beginner)
- sentdex — Python programming for data science, ML, and financial analysis (Intermediate)
- Alex The Analyst — Data analytics tutorials, portfolio projects, and career guidance (Beginner)
Books & Long-Form Reading
Free Online Books
- Python for Data Analysis (3rd Ed.) — Wes McKinney’s Pandas bible, free to read online (Beginner)
- Think Stats (2nd Ed.) — Statistics for programmers using Python with real datasets (Beginner)
- Introduction to Statistical Learning (ISLR) — Free textbook covering statistical learning methods with labs in R and Python (Intermediate)
- Probabilistic Programming & Bayesian Methods — Bayesian statistics for the computationally inclined (Advanced)
Essential Paid Books
- Hands-On Machine Learning (3rd Ed.) — Practical ML with scikit-learn, Keras, and TensorFlow by Aurelien Geron (Intermediate, Paid)
- Storytelling with Data — Data visualization best practices and communication techniques (Beginner, Paid)
- Naked Statistics — Accessible introduction to statistics for non-math backgrounds (Beginner, Paid)
- Python Data Science Handbook — Jake VanderPlas’s guide to NumPy, Pandas, Matplotlib, and scikit-learn (Intermediate, Paid)
- The Art of Statistics — David Spiegelhalter’s guide to learning from data (Beginner, Paid)
Community, Practice & News
Forums & Discussion
- r/datascience — Active community for data science discussion, career advice, and resources
- r/learnmachinelearning — Beginner-friendly ML learning community with project sharing
- Cross Validated (Stack Exchange) — Expert Q&A for statistics, probability, and data analysis
- Data Science Stack Exchange — Focused Q&A for data science practitioners and researchers
Newsletters & Blogs
- Data Elixir — Weekly curated data science news, articles, tools, and resources
- Data Science Weekly — Free weekly digest of data science, ML, and AI articles and jobs
- Towards Data Science — Medium publication with thousands of data science tutorials and insights
- KDnuggets — Data science news, tutorials, cheat sheets, and career advice since 1997
Ecosystem Resources
- Kaggle Datasets — Thousands of free datasets for practice, from tabular to image to text
- UCI Machine Learning Repository — Classic ML datasets used in research and education
- Google Dataset Search — Search engine for finding datasets across the web
Tools & Environments
- JupyterLab — Interactive notebook environment for data exploration, visualization, and documentation
- Google Colab — Free cloud-based Jupyter notebooks with GPU access and pre-installed libraries
- Anaconda — Python distribution with data science packages pre-installed and environment management
- Streamlit — Turn Python data scripts into shareable web apps with minimal code
- DuckDB — Fast in-process analytical database that queries Pandas DataFrames, Parquet, and CSV directly
- Observable — Interactive data visualization and analysis notebooks for the web
- Kaggle Notebooks — Free cloud notebooks with GPU/TPU access and direct dataset integration
- Deepnote — Collaborative data science notebooks with real-time editing and SQL integration
Cheat Sheets & Quick References
- Pandas Cheat Sheet — Official quick reference for DataFrame operations, filtering, grouping, and reshaping
- NumPy Cheat Sheet — Official reference for array creation, math operations, and broadcasting
- Matplotlib Cheat Sheet — Official visual reference for plot types, styling, and customization
- scikit-learn Algorithm Cheat Sheet — Decision flowchart for choosing the right ML algorithm based on your data
- Statistics Cheat Sheet — Quick reference for distributions, hypothesis tests, and confidence intervals
Project Ideas & Datasets
- Awesome Public Datasets — Curated list of free datasets organized by topic (economics, healthcare, social, sports)
- FiveThirtyEight Data — Datasets behind FiveThirtyEight articles covering sports, politics, and society
- TidyTuesday — Weekly social data project with new datasets and community visualizations
- Data Science Project Ideas (Dataquest) — Guided project ideas from beginner through advanced with real datasets