The GalaxiesML dataset is designed for machine learning applications in astrophysics. It includes 286,401 galaxy images from the Hyper-Suprime-Cam (HSC) Survey PDR2 across five filters: g, r, i, z, y. Spectroscopically confirmed redshifts serve as ground truth, making the dataset ideal for tasks such as redshift estimation and galaxy morphology classification. This dataset supports upcoming large-scale surveys like LSST and Euclid.
Examples of 64x64 pixel galaxy images from GalaxiesML in the five g,r,i,z,y filters. Images at 127x127 pixels are also available.
The dataset is split into 127x127 pixel and 64x64 pixel image resolutions. The following files are available for each resolution:
5x127x127_training_with_morphology.hdf5
- Training set with images and morphological parameters.5x127x127_training_with_morphology.csv
- Metadata and morphology for the training set.5x127x127_validation_with_morphology.hdf5
- Validation set with images and morphological parameters.5x127x127_validation_with_morphology.csv
- Metadata and morphology for the validation set.5x127x127_testing_with_morphology.hdf5
- Testing set with images and morphological parameters.5x127x127_testing_with_morphology.csv
- Metadata and morphology for the testing set.5x64x64_training_with_morphology.hdf5
- Training set with images and morphological parameters.5x64x64_validation_with_morphology.hdf5
- Validation set with images and morphological parameters.5x64x64_testing_with_morphology.hdf5
- Testing set with images and morphological parameters.The dataset includes several galaxy morphology parameters, such as:
Explore our Convolutional Neural Network (CNN) example on GitHub to see how to leverage the GalaxiesML dataset for galaxy classification and redshift estimation tasks. This example provides a comprehensive codebase to help better understand machine learning applications in astrophysics.
View CNN Example on GitHubThe GalaxiesML dataset is a valuable resource for both astrophysicists and data scientists. It provides a vast collection of galaxy images along with detailed photometric data and precise redshift measurements. Whether you're working on galaxy classification, redshift estimation, or other machine learning applications in astrophysics, this dataset offers the comprehensive data you need to drive your research forward. The datatset is available on Zenodo and can be accessed using the link below.
Access DatasetOur latest research paper leverages the GalaxyML dataset to introduce a novel method using Denoising Diffusion Probabilistic Models (DDPM) conditioned on redshift data to generate realistic galaxy images. We demonstrate that DDPM effectively captures the physical characteristics and evolutionary changes of galaxies, enhancing our understanding of cosmic phenomena through machine learning.
Download Research PaperColumn Name | Units | Description |
---|---|---|
object_id |
Object ID from the HSC survey. Unique ID in 64-bit integer | |
coord |
(deg, deg, deg) | Coordinate used in coneSearch(coord, RA, DEC, RADIUS) |
ra |
deg | RA (J2000.0) of the image center |
dec |
deg | DEC (J2000.0) of the image center |
{band}_cmodel_mag |
mag | Magnitude of the central galaxy in filter {band} |
{band}_cmodel_magsigma |
mag | Uncertainty in the magnitude in filter {band} |
skymap_id |
Location of the galaxy in internal survey position definition (tract, patch) | |
specz_name |
Name(s) of the galaxy in the spectroscopic survey(s) | |
specz_flag_homogeneous |
Homogenized spec-z flag. (TRUE=secure, FALSE=insecure) | |
specz_mag_i |
mag | i-band magnitude of the galaxy in the spectroscopic survey |
specz_ra |
deg | RA (J2000.0) of galaxy in spectroscopic survey |
specz_dec |
deg | DEC (J2000.0) of galaxy in spectroscopic survey |
specz_redshift |
Spectroscopic redshift | |
specz_redshift_err |
Spectroscopic redshift uncertainty | |
{band}_central_image_pol_15px_rad |
Photometry within a 15-pixel radius in filter {band} |
|
{band}_central_image_pop_10px_rad |
Photometry within a 10-pixel radius in filter {band} |
|
{band}_central_image_pop_5px_rad |
Photometry within a 5-pixel radius in filter {band} |
|
{band}_ellipticity |
Ellipticity of the object in filter {band} |
|
{band}_half_light_radius |
pixels | Radius containing 50% of the total flux in filter {band} |
{band}_isophotal_area |
pixels² | Isophotal area of the object in filter {band} |
{band}_major_axis |
pixels | Major axis of the detected object in filter {band} |
{band}_minor_axis |
pixels | Minor axis of the detected object in filter {band} |
{band}_peak_surface_brightness |
mag/sq. arcsec | Peak surface brightness in filter {band} |
{band}_petro_rad |
pixels | Petrosian radius in filter {band} |
{band}_pos_angle |
deg | Position angle of the object in filter {band} |
{band}_sersic_index |
Sérsic index in filter {band} |
|
{band}_total_galaxies |
Total number of galaxies detected in filter {band} |
If you use this dataset in your work, please cite the following references:
This dataset is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).