Explore Datasets

Back to Home
Heart Disease UCI
Heart Disease UCI
Source: Kaggle / UCI | Updated: 2 years ago | Size: 11 KB

A popular dataset for binary classification tasks, predicting the presence of heart disease based on patient attributes.

Healthcare
Classification
Tabular
Titanic - Machine Learning from Disaster
Titanic - Machine Learning from Disaster
Source: Kaggle | Updated: Competition | Size: 33 KB

Classic introductory dataset for predicting survival on the Titanic. Good for learning data cleaning and feature engineering.

Tabular
Binary Classification
Starter
CIFAR-10 Image Dataset
CIFAR-10 Image Dataset
Source: Kaggle / University of Toronto | Updated: Varies | Size: 170 MB

A collection of 60,000 32x32 color images in 10 classes, with 6,000 images per class. Widely used for image classification.

Image Classification
Computer Vision
Deep Learning
IMDB Dataset of 50K Movie Reviews
IMDB Dataset of 50K Movie Reviews
Source: Kaggle / Stanford | Updated: 4 years ago | Size: 25 MB

Dataset for binary sentiment classification, containing a set of 25,000 highly polar movie reviews for training, and 25,000 for testing.

NLP
Sentiment Analysis
Text Data
Global Terrorism Database (GTD)
Global Terrorism Database (GTD)
Source: Kaggle / START Consortium | Updated: Annually | Size: 150 MB

Comprehensive open-source database on terrorist events around the world from 1970 through 2017 (and often updated).

Social Science
Event Data
Geospatial
Credit Card Fraud Detection
Credit Card Fraud Detection
Source: Kaggle | Updated: 7 years ago | Size: 144 MB

Highly imbalanced dataset containing transactions made by credit cards in September 2013 by European cardholders.

Fraud Detection
Imbalanced Data
Classification
Iris Species
Iris Species
Source: Kaggle / UCI | Updated: 7 years ago | Size: 4 KB

Famous dataset for multiclass classification. Contains 3 classes of 50 instances each, where each class refers to a type of iris plant.

Classification
Biology
Beginner
MNIST Original (Digit Recognizer)
MNIST Original (Digit Recognizer)
Source: Kaggle / Yann LeCun | Updated: Competition | Size: 10 MB

A large database of handwritten digits that is commonly used for training various image processing systems.

Image Classification
Handwriting
Deep Learning
Fashion MNIST
Fashion MNIST
Source: Kaggle / Zalando Research | Updated: 6 years ago | Size: 30 MB

A dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.

Image Classification
Fashion
Deep Learning
Wine Quality Dataset
Wine Quality Dataset
Source: Kaggle / UCI | Updated: 7 years ago | Size: 240 KB

Two datasets are included, related to red and white vinho verde wine samples from the north of Portugal. The goal is to model wine quality based on physicochemical tests.

Regression
Classification
Food Science