August 16, 2018

AI for code encourages collaborative, open scientific discovery

by Kush Varshney, IBM

We have seen significant recent progress in pattern analysis and machine intelligence applied to images, audio and video signals, and natural language text, but not as much applied to another artifact produced by people: computer program source code. In a paper to be presented at the FEED Workshop at KDD 2018, we showcase a system that makes progress towards the semantic analysis of code. By doing so, we provide the foundation for machines to truly reason about program code and learn from it.

The work, also recently demonstrated at IJCAI 2018, is conceived and led by IBM Science for Social Good fellow Evan Patterson and focuses specifically on data science software. Data science programs are a special kind of computer code, often fairly short, but full of semantically rich content that specifies a sequence of data transformation, analysis, modeling, and interpretation operations. Our technique executes a data analysis (imagine an R or Python script) and captures all of the functions that are called in the analysis. It then connects those functions to a data science ontology we have created, performs several simplification steps, and produces a semantic flow graph representation of the program. As an example, the flow graph below is produced automatically from an analysis of rheumatoid arthritis data.

The technique is applicable across choices of programming language and package. The three code snippets below are written in R, Python with the NumPy and SciPy packages, and Python with the Pandas and Scikit-learn packages. All produce exactly the same semantic flow graph.

Credit: IBM
Credit: IBM

We can think of the semantic flow graph we extract as a single data point, just like an image or a paragraph of text, on which to perform further higher-level tasks. With the representation we have developed, we can enable several useful functionalities for practicing data scientists, including intelligent search and auto-completion of analyses, recommendation of similar or complementary analyses, visualization of the space of all analyses conducted on a particular problem or dataset, translation or style transfer, and even machine generation of novel data analyses (i.e. computational creativity)—all predicated on the truly semantic understanding of what the code does.

The Data Science Ontology is written in a new ontology language we have developed named Monoidal Ontology and Computing Language (Monocl). This line of work was initiated in 2016 in partnership with the Accelerated Cure Project for Multiple Sclerosis.

More information: E. Patterson et al. Dataflow representation of data analyses: Toward a platform for collaborative data science, IBM Journal of Research and Development (2017). DOI: 10.1147/JRD.2017.2736278

Provided by IBM

This story is republished courtesy of IBM Research. Read the original story here.

Citation: AI for code encourages collaborative, open scientific discovery (2018, August 16) retrieved 10 May 2024 from https://phys.org/news/2018-08-ai-code-collaborative-scientific-discovery.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Using machine learning to detect software vulnerabilities

49 shares

Feedback to editors

Scientists unlock key to breeding 'carbon gobbling' plants with a major appetite

5 hours ago

Clues from deep magma reservoirs could improve volcanic eruption forecasts

5 hours ago

Study shows AI conversational agents can help reduce interethnic prejudice during online interactions

5 hours ago

NASA's Chandra notices the galactic center is venting

6 hours ago

Wildfires in old-growth Amazon forest areas rose 152% in 2023, study shows

6 hours ago

GoT-ChA: New tool reveals how gene mutations affect cells

7 hours ago

Accelerating material characterization: Machine learning meets X-ray absorption spectroscopy

7 hours ago

Life expectancy study reveals longest and shortest-lived cats

7 hours ago

New research shows microevolution can be used to predict how evolution works on much longer timescales

7 hours ago

Stable magnetic bundles achieved at room temperature and zero magnetic field

7 hours ago

Load comments (0)

AI for code encourages collaborative, open scientific discovery

Scientists unlock key to breeding 'carbon gobbling' plants with a major appetite

Clues from deep magma reservoirs could improve volcanic eruption forecasts

Study shows AI conversational agents can help reduce interethnic prejudice during online interactions

NASA's Chandra notices the galactic center is venting

Wildfires in old-growth Amazon forest areas rose 152% in 2023, study shows

GoT-ChA: New tool reveals how gene mutations affect cells

Accelerating material characterization: Machine learning meets X-ray absorption spectroscopy

Life expectancy study reveals longest and shortest-lived cats

New research shows microevolution can be used to predict how evolution works on much longer timescales

Stable magnetic bundles achieved at room temperature and zero magnetic field

Relevant PhysicsForums posts

Most efficient way to randomly choose a word from a file with a list of words

Parallel processing for loops and pointer defined outside the loop

Links from navbar made with React don't work

Passing variables in FORTRAN

User-Defined Functions in Sql Server SSMS

Classifiers, threshold, and ROC curve

Using machine learning to detect software vulnerabilities

Cooperative software framework helps tame "too big" data

Novel high-performance hybrid system for semantic factoring of graph databases

Teaching computers to understand human languages

A shiny, new graph query system

Python bindings snake into global arrays toolkit

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Training artificial intelligence with artificial X-rays

New model for large-scale 3-D facial recognition

Medical Xpress

Tech Xplore

Science X

AI for code encourages collaborative, open scientific discovery

Scientists unlock key to breeding 'carbon gobbling' plants with a major appetite

Clues from deep magma reservoirs could improve volcanic eruption forecasts

Study shows AI conversational agents can help reduce interethnic prejudice during online interactions

NASA's Chandra notices the galactic center is venting

Wildfires in old-growth Amazon forest areas rose 152% in 2023, study shows

GoT-ChA: New tool reveals how gene mutations affect cells

Accelerating material characterization: Machine learning meets X-ray absorption spectroscopy

Life expectancy study reveals longest and shortest-lived cats

New research shows microevolution can be used to predict how evolution works on much longer timescales

Stable magnetic bundles achieved at room temperature and zero magnetic field

Relevant PhysicsForums posts

Related Stories

Using machine learning to detect software vulnerabilities

Cooperative software framework helps tame "too big" data

Novel high-performance hybrid system for semantic factoring of graph databases

Teaching computers to understand human languages

A shiny, new graph query system

Python bindings snake into global arrays toolkit

Recommended for you

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Training artificial intelligence with artificial X-rays

New model for large-scale 3-D facial recognition

Newsletter sign up

Donate and enjoy an ad-free experience