Posts Tagged ‘Data’

Black box software: a problem for science that extends to big data

May 17th, 2013

You probably don’t need to know how a calculator makes two plus two equal four, or how your favorite smartphone app works, but the way the background software is implemented can make a big difference to the output. Slight rounding errors or slow load times in these cases might be annoying, but when you scale up to big data modeling, for instance, you might want to take a closer look at the software running your calculations before you click go.

Blind trust in black box, or click-and-run, software is a growing problem in science, according to a commentary published Thursday in the journal Science, and the concern extends beyond formal research to other domains that use high performance computing.

The researchers who addressed the “troubling trend in scientific software use” were motivated by a growing unease that the abundance of powerful software is letting scientists derive answers without a thorough understanding of what the software is doing. Software snafus have been responsible for some high-profile data misinterpretations and retractions.

This wouldn’t normally cause a blip on the average citizen’s radar, but now a lot of these scientific conclusions have real-world implications, from climate modeling and weather forecasting to high volume financial trading. In any domain using big data, misplaced trust in the power of software can be problematic, particularly when the decision makers don’t know what the software they are using is doing, said lead author Lucas Joppa of Microsoft Research.

So what does ecology have to do with any of this? Joppa is an ecologist by training, and works on computational techniques in that field that may also have applications for big data more broadly. He and his colleagues surveyed scientists in a sub-field of ecology — species distribution modeling (SDM) — to find out how they choose software and how well they understand its inner workings.

“Lots of SDM techniques are only available as computational methods, but there is a lot of discourse going on in the literature about whether the methods themselves are correct,” said Joppa. Scientists use SDM to forecast where plants and animals will be in the future given current numbers, known habitats, and climate change. It’s a niche area of research, but the disquieting survey results should be noted in any domain where forecasting is done by plugging data into software.

Only 8 percent of the more than 400 scientists who responded had validated their modeling software against other methods. “The number speaks for itself,” said Joppa. “The real crux of the problem is the results from software being published in a peer-reviewed journal, versus the software itself having been peer-reviewed,” which is rare. Software packages, whether proprietary or not, are often black box systems that can’t be opened and inspected. Even if you can get under the proverbial hood, like with open source software, said Joppa, most people will still have no idea what they are looking at, or how to judge its quality.

To top it all off, having confidence in what your software is doing results in a massive computational catch-22: how do you know the software is giving you the right answer, if you can’t get the answer without running the software? The level of confusion over what algorithms are doing in the SDM field is illustrated by a debate over which of two statistical techniques is superior. It turns out, Joppa explained, that the two techniques were mathematically equivalent, but the ways they were implemented in software resulted in big predictive differences.

This sort of mix-up isn’t surprising given the messy nature of software development (if you can even call it that) in research environments. Joppa lauded efforts like Software Carpentry that teach scientists basic software fundamentals for better programming, and said the days of getting a doctorate by merely pushing a button are over.

“Scientists themselves can learn a bare minimum of software engineering,” said Joppa. On the flip side, he said computer science students should have more exposure to scientific methods. “People with traditional software engineering training become uncomfortable with the way scientists want to work with software, where the design and specs are constantly changing. The way that scientific software is built is fundamentally different from consumer apps.”

Developers of scientific software, like MathWorks or SAS, may want to watch this space. If Joppa’s suggestions are implemented, journals may start requiring that even proprietary software be opened up for inspection and peer-review. Nearly half of the surveyed ecologists report using free statistical language R as their primary software, so maybe there is hope yet, both for open, inspectable code, and for computational science becoming more accessible while yielding trustworthy, high impact results.

Source:http://gigaom.com/2013/05/16/black-box-software-a-problem-for-science-that-extends-to-big-data-2/

Facebook seeking even more personal data

April 1st, 2013

Every day, tens of millions of people post remarkably intimate details about their lives on Facebook. And yet the operators of the online social network say they still don’t know enough about their subscribers.

So Facebook is purchasing even more information on its members from data brokers — companies that collect huge amounts of sensitive information about the everyday activities of millions of Americans. Facebook will use the data, as well as information provided voluntarily by members, to target them with more relevant — and profitable — advertisements.

“We think that serving the right ads to the right people creates a better user experience on Facebook,” said a company spokeswoman Elisabeth Diana.

But privacy activists are alarmed by Facebook’s trafficking in so much sensitive data.

The Electronic Privacy Information Center, an online privacy watchdog, has urged the Federal Trade Commission to investigate Facebook’s use of such data, fearing it might violate an FTC consent order that required Facebook to toughen its protections for user privacy.

Originally, Facebook partnered with a single data broker, Colorado-based Datalogix, which resells customer information obtained from retailers. In late February, Facebook said it would also obtain data from three more providers: Acxiom Corp. in Little Rock, Ark., Epsilon of Dallas, and BlueKai Inc., of Cupertino, Calif.

In response, another Internet activist group, the Electronic Frontier Foundation,published guidelines for consumers to make it more difficult for Facebook­ and its business partners to keep tabs on their activities.

Facebook users have sometimes responded angrily when the company changes policies on personal information. But the deals with data brokers have elicited little protest.

Diana said that when a person’s Facebook data are combined with, say, a retail shopping history or financial records, the combined history is “anonymized,” so that an advertiser won’t know the person’s identity.

Diana said Facebook is not interested in pinpointing individual users, but rather in trying to identify groups of consumers with shared tastes and interests. Combining Facebook data with shopping records from Epsilon, for instance, might identify fortysomething males who buy cholesterol medications and Lee Child thrillers. With that, Facebook can present ads for pharmaceuticals and crime novels to just the right group of users, instead of also broadcasting them to teenagers with a taste for the Twilight books.

“We’re trying to provide people with a better ad experience,” Diana said. “We think we can do this in a way that protects user privacy.”

Still, the new policy underscores how difficult it is for consumers to restrict access to their personal data — online or offline.

When a consumer visits an Internet retailer, it’s likely that the company uses software from BlueKai to record the transaction. When the same consumer goes to a brick-and-mortar supermarket and uses a loyalty card to earn discounts, he is also giving Epsilon or some other company permission to track his purchases.

All this information can be sold by the data brokers to just about anybody.

Sarah Downey, a privacy analyst and attorney at Abine Inc., a Boston maker of privacy-protection software, said consumers can take steps to limit online data collection. Her company makes DoNotTrackMe, a free program that blocks tracking programs.

Under development is MaskMe, a program that generates an e-mail alias so a person can sign up for services without using his or her real e-mail address. Many data brokers and online sites use e-mail addresses as the main identifier of consumers, so using a different alias with each company makes it harder for them to combine their databases.

In addition, people can directly contact the brokers — Acxiom, Datalogix, Epsilon, and BlueKai — and ask not to be tracked. But the process is tedious and time-consuming, and there are always other businesses offering attractive services or tempting discounts in exchange for personal information.

Source:http://bostonglobe.com/business/2013/03/31/limiting-facebook-access-your-personal-data/6rJmUqnW2uzbrkJj3IQYxM/story.html

User-friendly software to access census data launched

December 18th, 2012

CensusInfo India, a new user-friendly database software to help people access, use and understand the statistical data of India’s latest population and housing census, was launched in the capital Monday.

“The CensusInfo India software is an innovative and flexible database technology. It helps the public to easily access, use and understand the statistics provided in the population and housing census, 2011, and reduces the burden of statistical drudgery,” Register General and India’s Census Commissioner C. Chandramouli said.

He added that data from other censuses would also be incorporated in the CensusInfo India module.

“The house listing and housing census has immense utility as it provides comprehensive data on the conditions of human settlements and housing deficit. So the easily available data can be used by departments of the central and state governments as well as NGOs,” Chandramouli said.

Source:http://zeenews.india.com/news/net-news/user-friendly-software-to-access-census-data-launched_817373.html

Get Adobe Flash playerPlugin by wpburn.com wordpress themes