Curated Information

November 15, 2010

What is curated information? Check out my article called “Curated information: what it means for researchers” in fumsi, a publication about finding, using, managing and sharing information, at http://web.fumsi.com/go/article/find/61479.

New feature in Google Scholar—search within cited articles

August 17, 2010

Google Scholar’s citation searching capability has recently become much more useful by allowing a search within cited articles.  

The searching of cited references has long been a technique that information professionals use for subject searching. We can discover recent articles of interest by finding those articles that have cited an older relevant paper, using citation indexes such as Scisearch in Dialog. Citation analysis is also used in academia to determine statistically the impact of an article or author.

In the free Internet arena, Google Scholar is one of the systems with the ability to do citation searching. Many articles found with Google Scholar have a ‘cited by’ link that leads to articles which have cited that article.

As an example, you have a copy of an article that is on target for your purpose, but is 2 decades old: “Effects of population structure on DNA fingerprint analysis in forensic science”.  Clicking on ‘Related articles’ finds more articles on the topic, but clicking on ‘Cited by 39’ provides still more.

Google Scholar screenshot

Google Scholar‘s newest feature provides the additional capability of searching within those cited articles for more precision. One useful application of this is in finding information about a particular technique. Answer the questions: “How has that technique been used?”  “Has it been used for …?”

The following paper describes a procedure for extracting DNA: “Chelex 100 as a medium for simple extraction of DNA for PCR-based typing from forensic material”.  It was cited by 3051 articles in Google Scholar and many of these articles are studies in which this technique was used. But 3051 are far too many articles to scroll through. Now, with this new feature, it is possible to search within those results for a particular area of interest by clicking the box “Search within articles citing…”.  Adding the search term “bone” results in 305 articles that may have used that technique to extract DNA from bone.

Google Scholar screenshot
Another option is to create an alert for articles citing a particular paper so that an email notification is received whenever Google Scholar adds an article citing that paper. Click on the “Cited by…” link for the paper of interest, and then the envelope icon.

Does Google know it all?

June 25, 2010

Using a Google search to find information to solve an engineering or scientific problem has pitfalls.  You may be evaluating the quality of the sources you are using (see last blog entry: “Look before you Leap”), but you may not be finding information that is complete enough. Even worse, you may believe you have the information you need, but are missing crucial points.

A great majority of the information freely available on the Internet is not found through Google, or even a combination of search engines. This unfound information includes a lot of information from databases and is known as the “invisible web”, or “deep web”. Much has been written about the invisible web, the techniques to use to find this information, and also the challenging efforts of search engines to try to search it (see Exploring a Deep Web that Google Can’t Grasp, and Deep Web Research 2009.)

Besides hard-to-find Internet information, there is other valuable information that is not available at all on the free Internet. It may require access to commercial systems or may not even be available in digital form (ie. paper journals).

Consider the following situations where a comprehensive search can be critical, and where a Google search alone would not be good enough:

  • A manufacturing engineer needs a complete overview of potential solutions to solve an obscure problem in his process.
  • A forensic scientist needs peer-reviewed articles acceptable to a judge in a Daubert hearing.
  • A geotechnical engineer needs to find geologic studies conducted within a narrow geographical area – some of which could be from early in the 20th century.
  • An R&D scientist for a manufacturing company needs market intelligence about a new market area being considered . What products are currently on the market? What is the structure of the industry? What companies are selling this product?

Use of a professional researcher with knowledge of the range and type of potential sources for this information can avoid the loss of time, money or a legal case.

Look before you leap

April 11, 2010

I cringe whenever I hear someone say “I got it from the Internet.” Do you know who put it on the Internet. And why? And when?

Getting the correct information—the “true facts”–can be crucial to the success of any venture.  Because information came from an authoritative source doesn’t mean it is a fact, but it does increase your odds. A scientific article that has been reviewed by peers and approved by editors has more credibility than an article in a trade journal. A balanced report has more credibility than one on the website of an obviously biased organization.  A white paper by a company on a topic related to its products may have valuable insight, but should be read with an awareness of the viewpoint of the writer. A science blogger may be quite astute, but we can’t assume that blogs are written with the same objectivity as a peer-reviewed article. A report that has been revised based upon new knowledge has more validity than the earlier version of the report that is still posted on someone else’s site.

Academic libraries often have guidelines about how to evaluate Internet sources, such as the one provided by Cornell University. Even children should learn these skills; see “Be a Good Web Detective” by the Oregon School Library Information System.

Complete evaluation of a source can involve a lot of work in checking out all the cited sources, reviewing other things written by the author, and researching the author’s background. Important questions are: Who funded the research? Are there studies evaluating the bias of this website or publication? Is there a subtle bias in the choice of articles by a scientific journal? Does the  traditional and well-respected news organization have a slant?.

Although a complete assessment of a source is not warranted in every situation, even a minimal consideration can be of great use and is too often neglected. Some of the basics to use in evaluating a source are little more than common sense.

  • are the author and date listed?
  • who is the publisher of the website?
  • is the language impartial or emotional?
  • does the website work? A poorly working site can be suspicious, although a professionally designed site can still have bad information

A quick tip: search on the title of the item you’re evaluating (within quotes). You may find reviews of the item or a newer version of it.

Nuclear power information

March 2, 2010

The current news about the possibility of developing new nuclear power plants in the U.S., the controversy over the radioactive waste disposal site at Yucca Mountain, and the appearance of a couple of recent Resource Shelf items had me wondering about the current access to technical information related to nuclear power production and waste. My curiosity was piqued because I’d spent months in the 1980s working with geologists and engineers who were characterizing sites for the disposal of nuclear waste for the U.S. Department of Energy (DOE) and contractors. Now, many years later, commercial radioactive waste is still stored on-site at the power plants and no new power plants have been built in the U.S. since the 1970’s.

At that time, besides geology literature for the specific potential locations, we also needed access to scientific literature and government documents directly related to nuclear power plant engineering, radioactive waste chemistry, and safety. In those days we searched bibliographic databases such as the one produced by the DOE through the commercial vendor Dialog on a paper based terminal with the incredible speed of 300 bits per second (yes, bits–that is not a typo). Today, to get comprehensive coverage of these topics it is still important to use commercial engineering and energy bibliographic databases on Dialog or other systems (at preposterously faster speeds), but much information is now available online at no cost. Some technical papers can be found searching systems such as Google Scholar, but others will be buried in the deep or invisible web.

Below is a brief description of some of the topical search systems which cover nuclear science information. Some are bibliographic databases which may or may not also include the full document; others are factual databases that provide data.

Literature

Three major sources besides general engineering bibliographic databases for searching the range of scientific literature on nuclear energy (or all energy topics) in journals, technical reports and other documents are discussed below.

Energy Citations Database, from the DOE:  Search over 2.6 million science research citations and 221,000 electronic documents, primarily from 1943 forward.

INIS, from the IAEA (International Atomic Energy Agency):  Resource Shelf related that IAEA has made INIS available for free; it was formerly only a subscription database. The INIS Database contains over 3 million bibliographic records with abstracts, as well as almost 200,000 full-text scientific and technical reports.

ETDEWeb, from the Energy Technology Data Exchange (ETDE), which is an international energy information exchange agreement under the International Energy Agency (IEA). Their database contains citations to worldwide literature—both published and non-conventional–with links to full text documents when possible. It includes 4,291,000 literature references and more than 304,000 full text documents, including information from DOE and many other countries’ agencies.

Government documents

Both the DOE and NRC (Nuclear Regulatory Agency), the two major nuclear power-related agencies in the U.S., provide full text online access to documents produced by the agency and its contractors.

For subject searchable and access to DOE documents produced mostly since 1991, use The Information Bridge .

NRC ADAMS collections: Resource Shelf pointed out that the NRC has launched a user-friendly web tool for searching the ADAMS collections of NRC technical documents and reports. The link in that article points to the wrong tool, although the confusion is understandable; there are two search tools available through the web.  I’ll try to clarify the differences between the two tools and the two collections they search.

The are two collections within ADAMS. The Publicly Available Records System (PARS) has full text documents, mostly since November 1999.  The Public Legacy Library is mostly bibliographic, with just citations to documents from before November 1999. The two access points to NRC ADAMS include “Web-based Access”,  which does not include the Public Legacy Library and “ADAMS Public Access”, which includes both collections. The second is more confusing to use; be sure to read the Sample Searches on that page before attempting to use it.

Facts

Besides the bibliographic databases above, several governmental agencies and other associations or companies provide actual data in factual databases. Some important examples are listed below.

PRIS (Power Reactor Information System): Since 1970, the IAEA has collected basic data about all power plants in the world including energy production and operational information. The IAEA lists about 100 other  nuclear information resources at NUCLEUS.

The NRC has a lot of factual information available on their site,  including a Facility Information Finder.

The major DOE contractors have a variety of interesting data available, such as the National Nuclear Data Center and International Nuclear Safety Center.

Other information is available from a variety of associations, companies and governments, including a Reactor Database from the  World Nuclear Association, and a variety of Nuclear Data Services from the Nuclear Energy Agency (NEA) of the Organisation for Economic Co-operation and Development (OECD).

DeepDyve: a new model for cost-effective access to journal articles

February 11, 2010

Have you ever needed to read a journal article and coveted the free (to you) electronic access you enjoyed when affiliated with a university?

Employees of small- and medium-sized enterprises (SME) and individual professionals, for example those in industry R&D and forensic science, often need to peruse scholarly articles. Yet they don’t have the same access to journals as academics or employees of larger companies who often can make use of online enterprise subscriptions and/or in-house libraries. Smaller corporate libraries and independent information professionals (see AIIP) will provide professional assistance in locating articles but are unlikely to have access to the cost benefits of the large-scale licensing options of larger enterprises.

A recent report commissioned by the Publishing Research Consortium confirmed that SMEs, at least in the UK, have more difficulty accessing professional literature than larger companies and academics (Access by UK small and medium-sized enterprises to professional and academic information, by Mark Ware).

An article can usually be purchased through the publisher’s online system or a commercial document delivery company – but that can get expensive, especially if a large number of articles need to be scanned. So, if the article is not from a journal for which you have a personal or society subscription, or an open access journal that is free on the Internet, what other cost-effective options are there? Academic libraries’ online databases are usually not a legitimate source for nonacademic users because of licensing issues. Public libraries provide access for cardholders to licensed content that includes some scientific journals, perhaps on a delayed basis.

A new and interesting option from DeepDyve is that of renting an article. An article can be rented from for 24 hours for $.99. Major publishers like Elsevier and Springer are not using this system, but others are. Read more about it at: Hope dyves deep in her review of DeepDyve! and DeepDyve — iTunes comes to Science Publishing.

It remains to be seen whether most major scholarly publishers will accept this model, but meanwhile, DeepDyve is another source to check for obtaining journal articles.

Eipert Information Services

Wolfram|Alpha: shortcomings and another way to use it

February 5, 2010

Wolfram|Alpha, which was released in May 2009, is touted as a “computational knowledge engine”. Their goal is to “provide a single source that can be relied on by everyone for definitive answers to factual queries.” But –can it be used for obtaining reliable scientific data?

Wolfram|Alpha doesn’t search the web, but rather, searches its own “internal knowledge base” to answer queries, and perform calculations if necessary. Then, it provides the result in tabular and visual formats. So it has the following components:  linguistic analysis (of the query), curated data, dynamic computation, and computed presentation.

My own interest was piqued by the curated data aspect, and the reliability of that data. Including knowledgeable human judgment in the data-gathering process should ensure at least some level of quality in the data, but is the right data for your purpose actually being found, calculated accurately, and displayed correctly?

It’s useful to use W|A to solve equations (see mathematics examples), compare stocks (enter ‘MSFT, AAPL, GOOG’), or find a summary of information on a topic (try entering a city name or date). For other interesting ways to use W|A, see Wolfram|Alpha: The Use Cases

The question still remains–can it be used for obtaining reliable scientific data?

One relevant issue is the current scope. Their ambitious long-term goal to “make all systematic knowledge immediately computable and accessible to everyone.” Some subject areas are covered well whereas others don’t yet have much data. Check out examples of what is currently available in various fields of knowledge at Wolfram|Alpha examples.

A more important issue is the reliability of the data and calculations.  According to its FAQ, under Education & Research, Wolfram|Alpha is a primary source for academic purposes, and should be cited as such. But how credible and reliable can this curated, computed data be since the original source is unknown?

Listed below are two examples of problems. They discuss the limitations of W|A as a source of data for a serious researcher or a medical professional.

W|A itself, in its Terms of Use, says “Common sense, and these Terms of Use, require that you independently verify the accuracy, completeness, and relevance of any information you get from Wolfram|Alpha before relying on it for any purpose in which things of value are at stake.”

The problem in verifying W|A results is that it is impossible to trace any particular item of data to its source, even though W|A does provide some source information.

For example, when searching for properties of various aluminum alloys, try entering ‘aluminum alloy’. W|A comes up with tables of composition and properties of one aluminum alloy, AA201.0-T6, with a drop-down box to choose other alloys. Clicking on ‘Source information’, at the bottom of the screen, yields little of value as the Primary Source is Wolfram|Alpha curated data, 2010, which doesn’t tell us anything we didn’t already know.

The ‘Background sources and references’ section can be interesting, and includes online sources as well as good old-fashioned reference books, but the list is not very specific to the query. In the ‘aluminium alloy’ case, the list includes the background sources for the entire Materials category: metals, chemicals, lumber, and more.

One way I like to use W|A – although this is not a way that was intended – is to look at the list of sources, handbooks, or databases on a topic in order to use them directly. In the aluminum alloy example, a searcher could discover such sources as eFunda and MatWeb. These sources can be more readily evaluated for credibility than W|A.

W|A is interesting for casual use, and likely to improve, but it is hard to see how this system could be considered an authoritative source for scientific data without a revealed source of where the data has come from.

Eipert Information Services


Follow

Get every new post delivered to your Inbox.