NotPlaNET: Removing False Positives from Planet Hunters TESS with Machine Learning

2405.18278

Published 5/29/2024 by Valentina Tardugno Poleo (NYU), Nora Eisner (CCA), David W. Hogg (NYU, CCA)

NotPlaNET: Removing False Positives from Planet Hunters TESS with Machine Learning

Abstract

Differentiating between real transit events and false positive signals in photometric time series data is a bottleneck in the identification of transiting exoplanets, particularly long-period planets. This differentiation typically requires visual inspection of a large number of transit-like signals to rule out instrumental and astrophysical false positives that mimic planetary transit signals. We build a one-dimensional convolutional neural network (CNN) to separate eclipsing binaries and other false positives from potential planet candidates, reducing the number of light curves that require human vetting. Our CNN is trained using the TESS light curves that were identified by Planet Hunters citizen scientists as likely containing a transit. We also include the background flux and centroid information. The light curves are visually inspected and labeled by project scientists and are minimally pre-processed, with only normalization and data augmentation taking place before training. The median percentage of contaminants flagged across the test sectors is 18% with a maximum of 37% and a minimum of 10%. Our model keeps 100% of the planets for 16 of the 18 test sectors, while incorrectly flagging one planet candidate (0.3%) for one sector and two (0.6%) for the remaining sector. Our method shows potential to reduce the number of light curves requiring manual vetting by up to a third with minimal misclassification of planet candidates.

Create account to get full access

Overview

This paper presents a machine learning approach called "NotPlaNET" to remove false positives from exoplanet candidate detections by the Planet Hunters TESS citizen science project.
The researchers used a convolutional neural network to classify light curves (measurements of a star's brightness over time) as either true exoplanet transits or other phenomena that can mimic exoplanet transits.
The goal was to improve the efficiency of exoplanet discovery by automatically filtering out false positives, which are common in large-scale sky surveys like TESS.

Plain English Explanation

The search for planets orbiting other stars, known as exoplanets, is an active area of astronomical research. One way to find exoplanets is by looking for slight dips in a star's brightness as a planet passes in front of it, known as a "transit." The Planet Hunters TESS citizen science project enlists volunteers to help identify these transit signals in data from NASA's Transiting Exoplanet Survey Satellite (TESS).

However, not every brightness dip corresponds to a genuine exoplanet - some are caused by other phenomena that can mimic exoplanet transits, such as binary star systems or instrument artifacts. These false positives can make it challenging to confirm actual exoplanet discoveries. To address this, the researchers developed a machine learning system called "NotPlaNET" that can automatically distinguish true exoplanet transits from these other types of signals in the TESS data.

The key idea is to train a convolutional neural network to analyze the shape and pattern of the light curve (the graph of a star's brightness over time) and classify it as either a real exoplanet transit or a false positive. This allows the researchers to filter out the false positives, making the process of discovering new exoplanets more efficient.

Technical Explanation

The researchers first assembled a dataset of light curves from the Planet Hunters TESS project, which had been labeled by citizen scientists as either exoplanet transits or other phenomena. They preprocessed the light curves to normalize the brightness and remove any long-term trends.

They then trained a convolutional neural network (CNN) model to classify the light curves. The CNN takes the light curve data as input and learns features that distinguish real exoplanet transits from false positives. The model was trained and evaluated using standard machine learning techniques.

The results showed that the NotPlaNET CNN model was able to accurately identify exoplanet transits, with a high true positive rate and low false positive rate. This suggests the approach could be valuable for automating the process of sifting through large amounts of data from sky surveys like TESS to find genuine exoplanet candidates more efficiently.

Critical Analysis

The paper provides a thorough description of the NotPlaNET system and demonstrates its effectiveness on the Planet Hunters TESS dataset. However, the authors acknowledge that the model was trained and evaluated on a limited dataset, and its performance may not generalize to other types of light curve data or exoplanet detection surveys.

Additionally, the authors do not explore potential biases in the citizen science labels used to train the model, which could impact its accuracy. There may be systematic differences between how individual volunteers classify light curves that the model learns to reproduce, rather than truly distinguishing exoplanet transits from other phenomena.

Further research could investigate ways to reduce these potential biases, such as using multiple independent labels per light curve or developing more objective criteria for true exoplanet transits. Applying the NotPlaNET approach to other exoplanet datasets, such as those from the Kepler or HARPS missions, could also help validate its broader applicability.

Conclusion

The NotPlaNET system presented in this paper demonstrates the potential of machine learning techniques, specifically convolutional neural networks, to improve the efficiency of exoplanet discovery from large-scale sky surveys. By automatically filtering out false positive detections, NotPlaNET could accelerate the pace of exoplanet research and lead to the identification of more genuine exoplanet candidates for further follow-up and characterization. As exoplanet search efforts continue to grow, methods like this will become increasingly valuable for making sense of the vast amounts of data collected by modern astronomical instruments.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🔎

Machine learning for exoplanet detection in high-contrast spectroscopy Combining cross correlation maps and deep learning on medium-resolution integral-field spectra

Rakesh Nath-Ranga, Olivier Absil, Valentin Christiaens, Emily O. Garvin

The advent of high-contrast imaging instruments combined with medium-resolution spectrographs allows spectral and temporal dimensions to be combined with spatial dimensions to detect and potentially characterize exoplanets with higher sensitivity. We develop a new method to effectively leverage the spectral and spatial dimensions in integral-field spectroscopy (IFS) datasets using a supervised deep-learning algorithm to improve the detection sensitivity to high-contrast exoplanets. We begin by applying a data transform whereby the IFS datasets are replaced by cross-correlation coefficient tensors obtained by cross-correlating our data with young gas giant spectral template spectra. This transformed data is then used to train machine learning (ML) algorithms. We train a 2D CNN and 3D LSTM with our data. We compare the ML models with a non-ML algorithm, based on the STIM map of arXiv:1810.06895. We test our algorithms on simulated young gas giants in a dataset that contains no known exoplanet, and explore the sensitivity of algorithms to detect these exoplanets at contrasts ranging from 1e-3 to 1e-4 at different radial separations. We quantify the sensitivity using modified receiver operating characteristic curves (mROC). We discover that the ML algorithms produce fewer false positives and have a higher true positive rate than the STIM-based algorithm, and the true positive rate of ML algorithms is less impacted by changing radial separation. We discover that the velocity dimension is an important differentiating factor. Through this paper, we demonstrate that ML techniques have the potential to improve the detection limits and reduce false positives for directly imaged planets in IFS datasets, after transforming the spectral dimension into a radial velocity dimension through a cross-correlation operation.

5/24/2024

cs.LG

🔎

Improving Earth-like planet detection in radial velocity using deep learning

Yinan Zhao, Xavier Dumusque, Michael Cretignier, Andrew Collier Cameron, David W. Latham, Mercedes L'opez-Morales, Michel Mayor, Alessandro Sozzetti, Rosario Cosentino, Isidro G'omez-Vargas, Francesco Pepe, Stephane Udry

Many novel methods have been proposed to mitigate stellar activity for exoplanet detection as the presence of stellar activity in radial velocity (RV) measurements is the current major limitation. Unlike traditional methods that model stellar activity in the RV domain, more methods are moving in the direction of disentangling stellar activity at the spectral level. The goal of this paper is to present a novel convolutional neural network-based algorithm that efficiently models stellar activity signals at the spectral level, enhancing the detection of Earth-like planets. We trained a convolutional neural network to build the correlation between the change in the spectral line profile and the corresponding RV, full width at half maximum (FWHM) and bisector span (BIS) values derived from the classical cross-correlation function. This algorithm has been tested on three intensively observed stars: Alpha Centauri B (HD128621), Tau ceti (HD10700), and the Sun. By injecting simulated planetary signals at the spectral level, we demonstrate that our machine learning algorithm can achieve, for HD128621 and HD10700, a detection threshold of 0.5 m/s in semi-amplitude for planets with periods ranging from 10 to 300 days. This threshold would correspond to the detection of a $sim$4$mathrm{M}_{oplus}$ in the habitable zone of those stars. On the HARPS-N solar dataset, our algorithm is even more efficient at mitigating stellar activity signals and can reach a threshold of 0.2 m/s, which would correspond to a 2.2$mathrm{M}_{oplus}$ planet on the orbit of the Earth. To the best of our knowledge, it is the first time that such low detection thresholds are reported for the Sun, but also for other stars, and therefore this highlights the efficiency of our convolutional neural network-based algorithm at mitigating stellar activity in RV measurements.

5/24/2024

cs.LG

The Detection of a Possible Exoplanet Orbiting KIC 1718360 Using Machine Learning

Jakob Roche

This paper presents the detection of a periodic dimming event in the lightcurve of the G1.5IV-V type star KIC 1718360. This is based on visible-light observations conducted by both the TESS and Kepler space telescopes. Analysis of the data seems to point toward a high rotation rate in the star, with a rotational period of 2.938 days. The high variability seen within the star's lightcurve points toward classification as a rotating variable. The initial observation was made in Kepler Quarter 16 data using the One-Class SVM machine learning method. Subsequent observations by the TESS space telescope corroborated these findings. It appears that KIC 1718360 is a nearby rotating variable that appears in little to no major catalogs as such. A secondary, additional periodic dip is also present, indicating a possible exoplanetary companion.

6/11/2024

cs.LG

Machine learning-based identification of Gaia astrometric exoplanet orbits

Johannes Sahlmann, Pablo G'omez

The third Gaia data release (DR3) contains $sim$170 000 astrometric orbit solutions of two-body systems located within $sim$500 pc of the Sun. Determining component masses in these systems, in particular of stars hosting exoplanets, usually hinges on incorporating complementary observations in addition to the astrometry, e.g. spectroscopy and radial velocities. Several DR3 two-body systems with exoplanet, brown-dwarf, stellar, and black-hole components have been confirmed in this way. We developed an alternative machine learning approach that uses only the DR3 orbital solutions with the aim of identifying the best candidates for exoplanets and brown-dwarf companions. Based on confirmed substellar companions in the literature, we use semi-supervised anomaly detection methods in combination with extreme gradient boosting and random forest classifiers to determine likely low-mass outliers in the population of non-single sources. We employ and study feature importance to investigate the method's plausibility and produced a list of 22 best candidates of which four are exoplanet candidates and another five are either very-massive brown dwarfs or very-low mass stars. Three candidates, including one initial exoplanet candidate, correspond to false-positive solutions where longer-period binary star motion was fitted with a biased shorter-period orbit. We highlight nine candidates with brown-dwarf companions for preferential follow-up. One candidate companion around the Sun-like star G 15-6 could be confirmed as a genuine brown dwarf using external radial-velocity data. This new approach is a powerful complement to the traditional identification methods for substellar companions among Gaia astrometric orbits. It is particularly relevant in the context of Gaia DR4 and its expected exoplanet discovery yield.

4/16/2024

cs.LG