# Explained: Regression analysis

##### Mar 16, 2010 by Peter Dizikes

(PhysOrg.com) -- Regression analysis. It sounds like a part of Freudian psychology. In reality, a regression is a seemingly ubiquitous statistical tool appearing in legions of scientific papers, and regression analysis is a method of measuring the link between two or more phenomena.

Imagine you want to know the connection between the square footage of houses and their sale prices. A regression charts such a link, in so doing pinpointing “an average causal effect,” as MIT economist Josh Angrist and his co-author Jorn-Steffen Pischke of the London School of Economics put it in their 2009 book, “Mostly Harmless Econometrics.”

To grasp the basic concept, take the simplest form of a regression: a linear, bivariate regression, which describes an unchanging relationship between two (and not more) phenomena. Now suppose you are wondering if there is a connection between the time high school students spend doing French homework, and the grades they receive. These types of data can be plotted as points on a graph, where the x-axis is the average number of hours per week a student studies, and the y-axis represents exam scores out of 100. Together, the data points will typically scatter a bit on the graph. The regression analysis creates the single line that best summarizes the distribution of points.

Mathematically, the line representing a simple linear regression is expressed through a basic equation: Y = a0 + a1 X. Here X is hours spent studying per week, the “independent variable.” Y is the exam scores, the “dependent variable,” since — we believe — those scores depend on time spent studying. Additionally, a0 is the y-intercept (the value of Y when X is zero) and a1 is the slope of the line, characterizing the relationship between the two variables.

Using two slightly more complex equations, the “normal equations” for the basic linear regression line, we can plug in all the numbers for X and Y, solve for a0 and a1, and actually draw the line. That line often represents the lowest aggregate of the squares of the distances between all points and itself, the “Ordinary Least Squares” (OLS) method mentioned in mountains of academic papers.

To see why OLS is logical, imagine a regression line running 6 units below one data point and 6 units above another point; it is 6 units away from the two points, on average. Now suppose a second line runs 10 units below one data point and 2 units above another point; it is also 6 units away from the two points, on average. But if we square the distances involved, we get different results: 62 + 62 = 72 in the first case, and 102 + 22 = 104 in the second case. So the first line yields the lower figure — the “least squares” — and is a more consistent reduction of the distance from the data points. (Additional methods, besides OLS, can find the best line for more complex forms of regression analysis.)

In turn, the typical distance between the line and all the points (sometimes called the “standard error”) indicates whether the regression analysis has captured a relationship that is strong or weak. The closer a line is to the data points, overall, the stronger the relationship.

Regression analysis, again, establishes a correlation between phenomena. But as the saying goes, correlation is not causation. Even a line that fits the data points closely may not say something definitive about causality. Perhaps some students do succeed in French class because they study hard. Or perhaps those students benefit from better natural linguistic abilities, and they merely enjoy studying more, but do not especially benefit from it. Perhaps there would be a stronger correlation between test scores and the total time students had spent hearing French spoken before they ever entered this particular class. The tale that emerges from good data may not be the whole story.

So it still takes critical thinking and careful studies to locate meaningful cause-and-effect relationships in the world. But at a minimum, helps establish the existence of connections that call for closer investigation.

Explore further: Strong teams attract crowds for international cricket

## Related Stories

#### Statistics Professor Hides Pictures, Messages in Problem Solutions

Apr 11, 2007

Say you’re an aspiring statistician who has just spent hours trying to figure out the answer to a particularly thorny problem. As you plug the final numbers into the computer program you’re running in ...

#### Speed cameras do reduce accidents, say researchers

Sep 12, 2008

(PhysOrg.com) -- Scientists at the University of Liverpool have developed an accident prediction model which proves that speed cameras are effective in reducing the number of road traffic accidents by 20 per cent.

#### When it's more than the 'terrible twos'

Dec 09, 2008

We all know how infants can act up during their terrible twos, but when these behaviors are accompanied by developmental setbacks, they could point to something more serious.

#### Scientists Discover A New Protein Partnership That Leads To Pediatric Tumor Regression

Sep 10, 2009

(PhysOrg.com) -- Why are some pediatric cancers able to spontaneously regress? Prof. Michael Fainzilber and his team of the Weizmann Institute's Biological Chemistry Department seem to have unexpectedly found part of the ...

#### Warning over polyclinics and super-surgeries

Sep 22, 2008

Research carried out at the University of Leicester by Carolyn Tarrant and Tim Stokes, of the Department of Health Sciences, and Andrew Colman, of the School of Psychology, suggests that polyclinics and super-surgeries are ...

#### Piling on the homework -- Does it work for everyone?

Aug 18, 2008

While U.S students continue to lag behind many countries academically, national statistics show that teachers have responded by assigning more homework. But according to a joint study by researchers at Binghamton University ...

## Recommended for you

#### Service is key to winery sales

20 hours ago

To buy, or not to buy? That is the question for the more than 5 million annual visitors to New York's wineries. Cornell University researchers found that customer service is the most important factor in boosting tasting room ...

Mar 07, 2014

A second viewing in a police line-up may help more eyewitnesses identify the culprit, new research from Flinders University reveals.

#### Statue of Egypt pharoanic princess found in Luxor

Mar 07, 2014

(AP)—Egypt has announced that a team of European archaeologists have found a nearly 2-meter- (6 ½-foot-) tall alabaster statue of a pharoanic princess, dating from approximately 1350 B.C., outside the ...

#### Expiration of terrorism risk insurance act could hurt national security, study finds

Mar 07, 2014

Allowing the federal terrorism risk insurance act to expire could have negative consequences for U.S. national security, according to a new study from the RAND Corporation.

#### Women's widespread inequality and rape as a weapon of war

Mar 07, 2014

Women are more likely to experience mass rape and sexual torture in armed conflicts around the world, as a deliberate strategy to humiliate, intimidate and dominate them and their 'enemy' community.

#### Lose yourself to dance, know yourself better

Mar 07, 2014

Could managers gain a new kind of understanding about their interaction with colleagues and employees by 'dancing'? That's the question arising from new research published this month in the International Journal of Work Or ...

##### thermodynamics
not rated yet Mar 16, 2010
It would be freshman stats class if the numbers were right. Since they cannot show exponentiation in the text they have 62 instead of 6^2 and the same for the other squares. So, they don't add up as typeset. You would think that a web site devoted to science could have some sort of math typesetting.
##### Roj
5 / 5 (1) Mar 16, 2010
Available to most browsers and text editors, ALT-0178 ² or X² has bean the standard ^2 ASCII code for several years.
##### toyo
5 / 5 (2) Mar 17, 2010
This 'filler' article is hardly news.
It should be in a beginner's or Basics section.
Lift you game, Physorg!
##### bernie_beckerman
5 / 5 (1) Mar 17, 2010
I have to agree that this is by far one of the most useless articles I've seen here for a number of reasons - in the order of importance

1) This is not news!!!
2) This is a really poor explanation of regression analysis.
3) The fact that OLS is "mentioned in mountains of academic papers" does not necessarily make it an appropriate method of analysing data.
4) Regression does not 'establish correlation', it established association. Language is important in scientific writing no matter what its purpose - so be precise! Correlation is an entirely different statistical tool.
5) And if I had more than 1000 characters for this comment I would expound on how articles of this type give individuals without statistical training a false sense of what they understand making them prone to deriving false inference from a study/article they don't really understand.

## More news stories

#### Service is key to winery sales

To buy, or not to buy? That is the question for the more than 5 million annual visitors to New York's wineries. Cornell University researchers found that customer service is the most important factor in boosting tasting room ...

#### Secret to the perfect pancake is discovered

In a collaboration with Meadowhall Shopping Centre, students from the University's Maths Society (SUMS) developed, trialled and tested a formula which enables pancake-lovers across the world to rustle-up ...

#### Postcode lottery for race relations

People's racial prejudices are influenced by where they live, reports a new study led by Oxford University psychologists.

#### Lose yourself to dance, know yourself better

Could managers gain a new kind of understanding about their interaction with colleagues and employees by 'dancing'? That's the question arising from new research published this month in the International Journal of Work Or ...

#### Question of race not simple for Mexican Americans

About half of Latinos check "white" in response to the question about race on the U.S. Census. About half check "other race."

#### Viewer interface for TV layers Web content for context

(Phys.org) —In past years, the television was less fondly called the idiot box. Today the TV is more fondly being promoted as a potential informationalized box, signified by technology offered by a new ...

#### WISE survey finds thousands of new stars, but no 'Planet X'

(Phys.org) —After searching hundreds of millions of objects across our sky, NASA's Wide-Field Infrared Survey Explorer (WISE) has turned up no evidence of the hypothesized celestial body in our solar system ...

#### Deer proliferation disrupts a forest's natural growth

By literally looking below the surface and digging up the dirt, Cornell researchers have discovered that a burgeoning deer population forever alters the progression of a forest's natural future by creating environmental havoc ...

#### New NASA Van Allen Probes observations helping to improve space weather models

(Phys.org) —Using data from NASA's Van Allen Probes, researchers have tested and improved a model to help forecast what's happening in the radiation environment of near-Earth space—a place seething with ...

#### Study on 3D scaffolds sets new bar in lung regeneration

In end-stage lung disease, transplantation is sometimes the only viable therapeutic option, but organ availability is limited and rejection presents an additional challenge. Innovative research efforts in ...