Table of Contents

Overview

The GalaxiesML dataset is designed for machine learning applications in astrophysics. It includes 286,401 galaxy images from the Hyper-Suprime-Cam (HSC) Survey PDR2 across five filters: g, r, i, z, y. Spectroscopically confirmed redshifts serve as ground truth, making the dataset ideal for tasks such as redshift estimation and galaxy morphology classification. This dataset supports upcoming large-scale surveys like LSST and Euclid.

HSC Survey Field Map
Map from the Hyper Suprime-Cam Team: https://hsc.mtk.nao.ac.jp/ssp/survey/#survey_fields

Features

  • 286,401 galaxy images in five photometric bands (g, r, i, z, y).
  • Spectroscopic redshifts for each galaxy, with redshift values ranging from 0.01 to 4.
  • Morphological parameters such as Sérsic index, half-light radius, and ellipticity.
  • Data provided in HDF5 format, along with CSV metadata.
  • Training, validation, and test splits available for machine learning applications.
Galaxy Images

Examples of 64x64 pixel galaxy images from GalaxiesML in the five g,r,i,z,y filters. Images at 127x127 pixels are also available.

Data Files

The dataset is split into 127x127 pixel and 64x64 pixel image resolutions. The following files are available for each resolution:

127x127 Pixel Image Files

64x64 Pixel Image Files

Galaxy Morphology Parameters

The dataset includes several galaxy morphology parameters, such as:

GitHub Examples

Explore our Convolutional Neural Network (CNN) example on GitHub to see how to leverage the GalaxiesML dataset for galaxy classification and redshift estimation tasks. This example provides a comprehensive codebase to help better understand machine learning applications in astrophysics.

View CNN Example on GitHub

Access the Dataset

The GalaxiesML dataset is a valuable resource for both astrophysicists and data scientists. It provides a vast collection of galaxy images along with detailed photometric data and precise redshift measurements. Whether you're working on galaxy classification, redshift estimation, or other machine learning applications in astrophysics, this dataset offers the comprehensive data you need to drive your research forward. The datatset is available on Zenodo and can be accessed using the link below.

Access Dataset

Download Research Paper

Our latest research paper leverages the GalaxyML dataset to introduce a novel method using Denoising Diffusion Probabilistic Models (DDPM) conditioned on redshift data to generate realistic galaxy images. We demonstrate that DDPM effectively captures the physical characteristics and evolutionary changes of galaxies, enhancing our understanding of cosmic phenomena through machine learning.

Download Research Paper

Tabular Data

Column Name Units Description
object_id Object ID from the HSC survey. Unique ID in 64-bit integer
coord (deg, deg, deg) Coordinate used in coneSearch(coord, RA, DEC, RADIUS)
ra deg RA (J2000.0) of the image center
dec deg DEC (J2000.0) of the image center
{band}_cmodel_mag mag Magnitude of the central galaxy in filter {band}
{band}_cmodel_magsigma mag Uncertainty in the magnitude in filter {band}
skymap_id Location of the galaxy in internal survey position definition (tract, patch)
specz_name Name(s) of the galaxy in the spectroscopic survey(s)
specz_flag_homogeneous Homogenized spec-z flag. (TRUE=secure, FALSE=insecure)
specz_mag_i mag i-band magnitude of the galaxy in the spectroscopic survey
specz_ra deg RA (J2000.0) of galaxy in spectroscopic survey
specz_dec deg DEC (J2000.0) of galaxy in spectroscopic survey
specz_redshift Spectroscopic redshift
specz_redshift_err Spectroscopic redshift uncertainty
{band}_central_image_pol_15px_rad Photometry within a 15-pixel radius in filter {band}
{band}_central_image_pop_10px_rad Photometry within a 10-pixel radius in filter {band}
{band}_central_image_pop_5px_rad Photometry within a 5-pixel radius in filter {band}
{band}_ellipticity Ellipticity of the object in filter {band}
{band}_half_light_radius pixels Radius containing 50% of the total flux in filter {band}
{band}_isophotal_area pixels² Isophotal area of the object in filter {band}
{band}_major_axis pixels Major axis of the detected object in filter {band}
{band}_minor_axis pixels Minor axis of the detected object in filter {band}
{band}_peak_surface_brightness mag/sq. arcsec Peak surface brightness in filter {band}
{band}_petro_rad pixels Petrosian radius in filter {band}
{band}_pos_angle deg Position angle of the object in filter {band}
{band}_sersic_index Sérsic index in filter {band}
{band}_total_galaxies Total number of galaxies detected in filter {band}

Citations

If you use this dataset in your work, please cite the following references:

License

This dataset is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).