April 20, 2016

Helping computers learn to tackle big-data problems outside their comfort zones

by Agency for Science, Technology and Research (A*STAR), Singapore

Imagine combing through thousands of mugshots desperately looking for a match. If time is of the essence, the faster you can do this, the better. A*STAR researchers have developed a framework that could help computers learn how to process and identify these images both faster and more accurately.

Peng Xi of the A*STAR Institute for Infocomm Research notes that the framework can be used for numerous applications, including image segmentation, motion segmentation, data clustering, hybrid system identification and image representation.

A conventional way that computers process data is called representation learning. This involves identifying a feature that allows the program to quickly extract relevant information from the dataset and categorize it—a bit like a shortcut. Supervised and unsupervised learning are two of the main methods used in representation learning. Unlike supervised learning, which relies on costly labeling of data prior to processing, unsupervised learning involves grouping or 'clustering' data in a similar manner to our brains, explains Peng.

Subspace clustering is a form of unsupervised learning that seeks to fit each data point into a low-dimensional subspace to find an intrinsic simplicity that makes complex, real-world data tractable. Existing subspace clustering methods struggle to handle 'out-of-sample', or unknown, data points and the large datasets that are common today.

"One of the challenges of the big-data era is to organize out-of-sample data using a machine learning model based on 'in-sample', or known, observational data," explains Peng who, with his colleagues, has proposed three methods as part of a unified framework to tackle this issue. These methods differ in how they implement representation learning; one focuses on sparsity, while the other two focus on low rank and grouping effects. "By solving the large-scale data and out-of-sample clustering problems, our method makes big-data clustering and online learning possible," notes Peng.

The framework devised by the team splits input data into 'in-sample' data or 'out-of-sample' data during an initial 'sampling' step. Next, the in-sample data is grouped into subspaces during the 'clustering' step, after which the out-of-sample data is assigned to the nearest subspace. These points are then designated as cluster members.

The team tested their approach on a range of datasets including different types of information, from facial images to text—both handwritten and digital—poker hands and forest coverage. They found that their methods outperformed existing algorithms and successfully reduced the computational complexity (and hence running time) of the task while still ensuring cluster quality.

More information: Xi Peng et al. A Unified Framework for Representation-Based Subspace Clustering of Out-of-Sample and Large-Scale Data, IEEE Transactions on Neural Networks and Learning Systems (2015). DOI: 10.1109/TNNLS.2015.2490080

Provided by Agency for Science, Technology and Research (A*STAR), Singapore

Citation: Helping computers learn to tackle big-data problems outside their comfort zones (2016, April 20) retrieved 16 August 2024 from https://phys.org/news/2016-04-tackle-big-data-problems-comfort-zones.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

A counterintuitive approach yields big benefits for high-dimensional, small-sized dataset problems

20 shares

Feedback to editors

Scientists pinpoint dino-killing asteroid's origin: past Jupiter

5 minutes ago

Soundscape study shows how underground acoustics can amplify soil health

4 hours ago

Scottish and Irish rocks confirmed as rare record of 'snowball Earth'

9 hours ago

Blind cavefish have extraordinary taste buds that increase with age, research reveals

11 hours ago

'Mercury bomb' threatens millions as Arctic temperatures rise, study warns

11 hours ago

Team develops method for control over single-molecule photoswitching

12 hours ago

X-ray irradiation technique helps to control cancer-causing poison in corn

13 hours ago

Physicists uncover new phenomena in fractional quantum Hall effects

13 hours ago

Researchers observe 'locked' electron pairs in a superconductor cuprate

13 hours ago

Scientists discover superbug's rapid path to antibiotic resistance

14 hours ago

Load comments (0)

Helping computers learn to tackle big-data problems outside their comfort zones

Scientists pinpoint dino-killing asteroid's origin: past Jupiter

Soundscape study shows how underground acoustics can amplify soil health

Scottish and Irish rocks confirmed as rare record of 'snowball Earth'

Blind cavefish have extraordinary taste buds that increase with age, research reveals

'Mercury bomb' threatens millions as Arctic temperatures rise, study warns

Team develops method for control over single-molecule photoswitching

X-ray irradiation technique helps to control cancer-causing poison in corn

Physicists uncover new phenomena in fractional quantum Hall effects

Researchers observe 'locked' electron pairs in a superconductor cuprate

Scientists discover superbug's rapid path to antibiotic resistance

Relevant PhysicsForums posts

Python Socket library to create a server and client scripts

Safe, free and unlimited xls to xlsx converter?

Help solving a geometrical matching issue with Graph Neural Networks

5 GHz PC WiFi connection Cybersecurity question

Help with some optimization code for Block Matrices

Is an API Always Necessary for Server-Client Communication?

A counterintuitive approach yields big benefits for high-dimensional, small-sized dataset problems

Technique enables pattern-recognition systems to convey what they learn to humans

System predicts 85 percent of cyber-attacks using input from human experts

Improving machine learning with an old approach

Microsoft open sources Distributed Machine Learning Toolkit for more efficient big data research

Big data model improves prediction of key hospital outcome

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Training artificial intelligence with artificial X-rays

New model for large-scale 3-D facial recognition

Medical Xpress

Tech Xplore

Science X

Helping computers learn to tackle big-data problems outside their comfort zones

Scientists pinpoint dino-killing asteroid's origin: past Jupiter

Soundscape study shows how underground acoustics can amplify soil health

Scottish and Irish rocks confirmed as rare record of 'snowball Earth'

Blind cavefish have extraordinary taste buds that increase with age, research reveals

'Mercury bomb' threatens millions as Arctic temperatures rise, study warns

Team develops method for control over single-molecule photoswitching

X-ray irradiation technique helps to control cancer-causing poison in corn

Physicists uncover new phenomena in fractional quantum Hall effects

Researchers observe 'locked' electron pairs in a superconductor cuprate

Scientists discover superbug's rapid path to antibiotic resistance

Relevant PhysicsForums posts

Related Stories

A counterintuitive approach yields big benefits for high-dimensional, small-sized dataset problems

Technique enables pattern-recognition systems to convey what they learn to humans

System predicts 85 percent of cyber-attacks using input from human experts

Improving machine learning with an old approach

Microsoft open sources Distributed Machine Learning Toolkit for more efficient big data research

Big data model improves prediction of key hospital outcome

Recommended for you

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Training artificial intelligence with artificial X-rays

New model for large-scale 3-D facial recognition

Newsletter sign up

Donate and enjoy an ad-free experience