ML p(r)ior | Feature Selection for Regression Problems Based on the Morisita Estimator of Intrinsic Dimension

Feature Selection for Regression Problems Based on the Morisita Estimator of Intrinsic Dimension

2016-01-31
Data acquisition, storage and management have been improved, while the key factors of many phenomena are not well known. Consequently, irrelevant and redundant features artificially increase the size of datasets, which complicates learning tasks, such as regression. To address this problem, feature selection methods have been proposed. This paper introduces a new supervised filter based on the Morisita estimator of intrinsic dimension. It can identify relevant features and distinguish between redundant and irrelevant information. Besides, it offers a clear graphical representation of the results, and it can be easily implemented in different programming languages. Comprehensive numerical experiments are conducted using simulated datasets characterized by different levels of complexity, sample size and noise. The suggested algorithm is also successfully tested on a selection of real world applications and compared with RReliefF using extreme learning machine. In addition, a new measure of feature relevance is presented and discussed.
PDF

Highlights - Most important sentences from the article

Login to like/save this paper, take notes and configure your recommendations

Related Articles

2017-06-28

This paper presents the key algorithmic techniques behind CatBoost, a new gradient boosting toolkit.… show more
PDF

Highlights - Most important sentences from the article

2017-11-22

Modern biomedical data mining requires feature selection methods that can (1) be applied to large sc… show more
PDF

Highlights - Most important sentences from the article

2018-08-10

Extracting characteristics from the training datasets of classification problems has proven effectiv… show more
PDF

Highlights - Most important sentences from the article

2016-10-25

The all-relevant problem of feature selection is the identification of all strongly and weakly relev… show more
PDF

Highlights - Most important sentences from the article

2018-10-14

Data preprocessing techniques are devoted to correct or alleviate errors in data. Discretization and… show more
PDF

Highlights - Most important sentences from the article

2019-04-23
1904.10387 | cs.LG

We introduce an algorithm that learns correlations between two datasets, in a way which can be used … show more
PDF

Highlights - Most important sentences from the article

2018-10-24

In this paper we present CatBoost, a new open-sourced gradient boosting library that successfully ha… show more
PDF

Highlights - Most important sentences from the article

2018-04-10

The random forest algorithm (RF) has several hyperparameters that have to be set by the user, e.g., … show more
PDF

Highlights - Most important sentences from the article

2019-03-17

High-dimensional data in many machine learning applications leads to computational and analytical co… show more
PDF

Highlights - Most important sentences from the article

2016-04-21

In this paper we present a deep neural network topology that incorporates a simple to implement tran… show more
PDF

Highlights - Most important sentences from the article

2018-01-04

Many of the existing machine learning algorithms, both supervised and unsupervised, depend on the qu… show more
PDF

Highlights - Most important sentences from the article

2019-02-08
1902.02949 | cs.NE

Exploratory data analysis is a fundamental aspect of knowledge discovery that aims to find the main … show more
PDF

Highlights - Most important sentences from the article

2019-05-15

The goal of this paper was to predict the placement in the multiplayer game PUBG (playerunknown batt… show more
PDF

Highlights - Most important sentences from the article

2018-06-15

Centroid-based methods including k-means and fuzzy c-means are known as effective and easy-to-implem… show more
PDF

Highlights - Most important sentences from the article

2018-11-12
1811.04661 | cs.CV

Biclustering is found to be useful in areas like data mining and bioinformatics. The term biclusteri… show more
PDF

Highlights - Most important sentences from the article

2019-02-19
1902.07215 | astro-ph.IM

An ever-looming threat to astronomical applications of machine learning is the danger of over-fittin… show more