Excel programming for nonprogrammers

May 08, 2012 by Larry Hardesty
In this illustration of the researchers' system in action, a user provides a pair of examples that correlate data — product names and dates — with an arithmetic expression. The system automatically determines the sources of the data — the tables below — and generalizes from the examples to additional expressions (in bold).

Microsoft’s Visual Basic programming language lets Excel users customize their spreadsheets in all kinds of time-saving ways, but few people take advantage of it. Although designed to be intuitive and easy to use, Visual Basic can still be daunting to users with no previous programming experience.

As both an intern and a consultant at Microsoft, Rishabh Singh, a graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT, has worked with Microsoft Research’s Sumit Gulwani to develop a system that lets Excel users customize their spreadsheets simply by giving examples of how they want data processed. Singh and Gulwani describe the work in a pair of papers they’re presenting this summer at the Computer-Aided Verification and Very Large Databases conferences.

The system has two main components. The first, which Singh and Gulwani developed during Singh’s first internship at Microsoft, can modify text strings on the basis of a few examples. For instance, if you had a spreadsheet in which dates were entered in the form “17 January 2009,” you could simply type one or two examples of dates in the form “Jan. 17, 2009”; the system would automatically deduce that you want the order of the month and day reversed, the month truncated after the first three letters, and punctuation added in the appropriate places.

But over the last two years, Singh and Gulwani have greatly expanded the power of their system by allowing it to exploit tables that establish correlations between different types of data. Suppose that you had a column of data from an old database in which the date “January 17, 2009” was instead rendered in the form “011709.” With the latest version of the system, you could create a 12-entry table correlating numbers with the names of months: “01” for January, “02” for February, and so on. Now, again on the basis of just a few examples, the system would learn to convert the numbers in the text strings to the names of the corresponding months.

Changing tables

Table-based correlations are particularly useful because, in practice, companies often have multiple databases or spreadsheets that contain different types of data about the same objects. For instance, a company might have Excel files from different retailers reporting sales figures and separate in-house documents recording R&D expenses for the same products. Using Singh and Gulwani’s system, an Excel user could create a new file that compares aggregated sales figures with aggregated R&D expenses, again just by providing a few examples. The user does not even need to specify which information came from which table: The system automatically finds correlations across different sources.

At the moment, Singh says, the system works only with text. So, for instance, it could learn to automatically produce text strings that represent mathematical relationships between data, but it couldn’t evaluate them. That should be easy to remedy, however: Excel already has a “formula builder” function that converts strings such as “(323 + 73) / 9” into the corresponding mathematical expressions.

In the future, Singh and Gulwani’s work could find its way into a commercial release of Excel. Indeed, the latest beta release of Excel includes a simpler example-based training system based on some of Gulwani’s earlier work.

The blow-up

Technically, the chief challenge in designing the system was handling the explosion of possible interpretations for any group of examples. Suppose that you had a list of times in military format that you wanted to convert to conventional hour-and-minute format. Your first example might be converting “1515” to “3:15.” But which 15 in the first string corresponds to the 15 in the second? It’s even possible that the string “3:15” takes its 1 from the first 1 in “1515” and its 5 from the second 5. Similarly, the first 15 may correspond to the 3, but it’s also possible that all the new strings are supposed to begin with 3’s.

“Typically, we have millions of expressions that actually conform to a single example,” Singh says. “Then we have multiple examples, and I’m going to intersect them to find common expressions that work for all of them.” The trick, Singh explains, was to find a way to represent features shared by many expressions only once each. In experiments, Singh and Gulwani found that they never needed more than three examples in order to train their system.

“If you look at the macros [small programs] that one would have to write in order to perform those text transformations manually, compared to the few demonstrations that you do as an end-user, it’s quite amazing how much programming you can avoid doing through this system,” says Rastislav Bodík, an associate professor of computer science at the University of California at Berkeley who specializes in automatic synthesis.

Bodík says the researchers addressed two central technical problems. The first is that “you need to search humongous spaces of programs,” Bodík says, “so you need good algorithms for doing search in that space of programs that can prune effectively huge subspaces.” But the second problem “is the interaction with the user,” Bodík says. “It seems that they figured out a pretty nice system for how the user can convey to the synthesizer of programs what they have in mind without really having to know what’s going on underneath.”

Explore further: Computer scientists can predict the price of Bitcoin

Related Stories

A call for an evolved understanding of emotion

Jan 04, 2012

(Medical Xpress) -- Many scientists believe that all people experience and express the same biologically “basic” emotions — an idea they have attributed to evolutionist Charles Darwin and one ...

iSchool prof predicts the future of search user interfaces

Nov 07, 2011

School of Information professor Marti Hearst predicts the future of online search interfaces in an article in this month’s edition of the Communications of the ACM. “The future of user interfaces will involv ...

NASA shuts down its last mainframe computer

Feb 15, 2012

NASA has just powered down its last mainframe computer. Umm, everyone remembers what a mainframe computer is, right? Well, you certainly must recall working with punched cards, paper tape, and/or magnetic ...

The kids are alright

May 26, 2011

Children should be seen and not heard... who says? A Philosophy academic at The University of Nottingham is challenging the adage by teaching primary school children to argue properly.

Recommended for you

Tech firm fined for paying workers $1.21 per hour

1 hour ago

A Silicon Valley company is paying more than $43,000 in back wages and penalties after labor regulators found eight employees imported from India were being treated like they were in an overseas sweat shop while they were ...

User comments : 2

Adjust slider to filter visible comments by rank

Display comments: newest first

jimbo92107
not rated yet May 08, 2012
Visual Basic, "intuitive?" ROFL. Falling down is intuitive. Programming is not.
alfie_null
not rated yet May 10, 2012
I'm not completely comfortable with this idea. How can I be assured that edge conditions have been identified and dealt with? For instance: a date might be expressed as m-d-y or d-m-y. Consider the serious uses to which spreadsheets are put (e.g. corporate financials). Worth spending the extra resources on rigorous analysis, testing, etc.

Hammering screws: Some people I work with know only spreadsheets. They go to amazing lengths to fit their problems (and data) into that model.