Software provides a clear overview in long documents

Jul 25, 2014
In the future, a software will help users better analyze long texts such as the documents for calls for bids, which are often more than one thousand pages long. Experts at Siemens’ global research unit Corporate Technology have developed a search function that enables users to simultaneously look for key words and sections of text in all of the documents of a call for bids, for example, without having to actually open any of the files. This makes the search very fast so that it only takes a few milliseconds before users can read the search results in the documents. The picture shows an employee of the bavarian state library working in its digitalization center at a scanning robot. Credit: Bayerische Staatsbibliothek / H.-R. Schulz

In the future, a software will help users better analyze long texts such as the documents for calls for bids, which are often more than one thousand pages long. Experts at Siemens' global research unit Corporate Technology have developed a search function that enables users to simultaneously look for key words and sections of text in all of the documents of a call for bids, for example, without having to actually open any of the files. This makes the search very fast so that it only takes a few milliseconds before users can read the search results in the documents. The experts also developed a component that checks to see how requirements have changed compared to previous versions of a specific text. As reported in the current issue of "Pictures of the Future" magazine, the ultimate goal is to create a semantic software that recognizes interrelationships in order to find relevant information.

Corporate Technology originally developed the as part of a feasibility study regarding the digitization of all land registers in Germany. A system was required that could record automated information regarding owners, property sizes, outstanding mortgages, and other matters from the land registers of the past 50 years (around 500 million pages of PDF files). The software had to be able to extract the required information with the help of the respective document structure. The software also had to be able to handle scans of poorly copied typewritten pages or repeatedly corrected documents.

To develop the software for calls for bids in industry, the researchers at CT are cooperating closely with colleagues from the corresponding Siemens businesses. The researchers are using this as a basis for developing characteristic search algorithms that enable users to find all of the information that a document contains about certain topics such as safety or pollution control.

Because calls for bids are repeatedly adjusted during a project, the software then identifies and displays any changes compared to previous versions of the document. In the third step, the software looks for analogies to previous, similar calls for bids so that users can see how certain requirements were evaluated in the past. The automatic semantic evaluation of large for a bid saves time, prevents mistakes, and makes it easier for users to integrate and analyze changes that were made at short notice.

Explore further: Microsoft releases OneNote for Macs, makes it free

add to favorites email to friend print save as pdf

Related Stories

Microsoft releases OneNote for Macs, makes it free

Mar 17, 2014

Microsoft Corp. on Monday released a version of its OneNote note-taking software for Macs and added new features and a free tier for all of the software's users in moves clearly targeted at up-and-coming productivity ...

Cabinet NG, Sage Link Document, Account Management Platforms

May 08, 2007

Cabinet NG has launched its CNG-Shared Access Filing Environment (SAFE) platform, a document management platform designed to be used with Sage Software's accounting platform, allowing small enterprises to organize documents, ...

Sony Digital Paper offers 12.6-ounce business rewrite

Mar 29, 2014

(Phys.org) —Sony announced Thursday that its Digital Paper device will be available in May through a select group of Worldox agents, priced at $1,100. The product is intended for professionals catching ...

Recommended for you

BPG image format judged awesome versus JPEG

Dec 17, 2014

If these three letters could talk, BPG, they would say something like "Farewell, JPEG." Better Portable Graphics (BPG) is a new image format based on HEVC and supported by browsers with a small Javascript ...

Atari's 'E.T.' game joins Smithsonian collection

Dec 15, 2014

One of the "E.T." Atari game cartridges unearthed this year from a heap of garbage buried deep in the New Mexico desert has been added to the video game history collection at the Smithsonian.

User comments : 1

Adjust slider to filter visible comments by rank

Display comments: newest first

julianpenrod
not rated yet Jul 25, 2014
Essentially working to institutionalize, and so marginalize to the point of obscurity, meaning render them so unrecognized that few will try to change their rotten ways, the "tl;dr" crowd. The contemptuous, arrested development freaks who can't abide any written text unless it consists of no more than a few three sentence paragraphs, each sentence no more than seven words, each word having no more than two syllables. Or unless the text is a "Harry Potter" book. How many, incidentally, have wondered if many documents like calls for bids might actually have corrupt machinations built in, like deliberately contradictory provisions, counting on the length to dissuade many from finding them? If occurrences of words are only referenced, what indication will be of their context, whether they are, say, self contradictory? And if the software's list of occurrences of a term is 150 instances long, how many of the "tl;dr" crowd will read even the list?

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.