September 1, 2015

Identifying illegal websites in photos

European computer scientists have developed a way to "read" web addresses in images that could improve filters for blocking pornographic, gambling and other sites. They provide details in the new issue of the International Journal of Reasoning-based Intelligent Systems.

Internet marketers of all shades might add a website address, a URL, to a graphic or photo that might then be found through an image search engine. The user finding such an image may be interested in visiting said site, but will have to type out the URL into their browser's address bar to do so. Conversely, the URL might point to illicit content - pornography, gambling sites, illegal drugs, terrorist propaganda. In that content, those in authority, whether parents and guardians of children or law enforcement, may wish to automatically blacklist such URLs.

Now, Nikolay Neshov of the Technical University of Sofia, Bulgaria and colleagues at the University of Karlstad, Sweden, and the University of Belgrade, Serbia, have developed a computer algorithm that can detect the presence of text overlaid on to an image or a still from a video, extract the text and convert it into an active URL for accessing or blocking a website.

Simple optical character recognition (OCR) does not work well with text overlaid on images as the background is usually complex, the text is likely to be of lower resolution and lower intensity and contrast than that seen in a scanned document or page, for instance. The new approach uses an identification extraction technique that finds anomalies in an image that would be present if text is overlaid. It then removes the details surrounding those anomalies leaving just the area occupied by any text - the team calls this the binarisation process. This isolated text image can then be fed into an (OCR) system to convert the image of the text into actual text in the computer.

The team has successfully tested their algorithm on thousands of images with overlaid URLs. They were able to identify 619 URLs from a random selection of 1000 test images at a rate of three per second using their approach. Conventional OCR was faster but only found 83 URLs in the same 1000 images, an improvement from about 8% to more than 60%.

The researchers' initial motivation was to assist computer forensic investigations in which tens of thousands of illegal and illicit photos must be scanned and any associated websites identified quickly in an investigation. This is critical in investigations of child pornography and child sexual abuse, the team reports, but such work is often stymied by the vast numbers of images involved.

Given that internet search companies and other service providers are involved in various initiatives to identify and block illegal material on the internet, this new approach to URL extraction from images could be added to their arsenal of techniques for detecting such content as well as being useful in criminal investigations surrounding said content.

More information: "Finding URLs in images by text extraction in DCT domain, recognition and matching in dictionary." Int. J. Reasoning-based Intelligent Systems, Vol. 7, Nos. 1/2, pp.78–92. DOI: 10.1504/IJRIS.2015.070916

Provided by Inderscience

Citation: Identifying illegal websites in photos (2015, September 1) retrieved 29 June 2024 from https://phys.org/news/2015-09-illegal-websites-photos.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Researchers create first image-recognition software that greatly improves web searches

21 shares

Feedback to editors

The Milky Way's eROSITA bubbles are large and distant

6 hours ago

Saturday Citations: Armadillos are everywhere; Neanderthals still surprising anthropologists; kids are egalitarian

6 hours ago

NASA astronauts will stay at the space station longer for more troubleshooting of Boeing capsule

10 hours ago

The beginnings of fashion: Paleolithic eyed needles and the evolution of dress

Jun 28, 2024

Analysis of NASA InSight data suggests Mars hit by meteoroids more often than thought

Jun 28, 2024

New computational microscopy technique provides more direct route to crisp images

Jun 28, 2024

A harmless asteroid will whiz past Earth Saturday. Here's how to spot it

Jun 28, 2024

Tiny bright objects discovered at dawn of universe baffle scientists

Jun 28, 2024

New method for generating monochromatic light in storage rings

Jun 28, 2024

Soft, stretchy electrode simulates touch sensations using electrical signals

Jun 28, 2024

Load comments (0)

Identifying illegal websites in photos

The Milky Way's eROSITA bubbles are large and distant

Saturday Citations: Armadillos are everywhere; Neanderthals still surprising anthropologists; kids are egalitarian

NASA astronauts will stay at the space station longer for more troubleshooting of Boeing capsule

The beginnings of fashion: Paleolithic eyed needles and the evolution of dress

Analysis of NASA InSight data suggests Mars hit by meteoroids more often than thought

New computational microscopy technique provides more direct route to crisp images

A harmless asteroid will whiz past Earth Saturday. Here's how to spot it

Tiny bright objects discovered at dawn of universe baffle scientists

New method for generating monochromatic light in storage rings

Soft, stretchy electrode simulates touch sensations using electrical signals

Relevant PhysicsForums posts

Who can find the largest prime number with their own programmed code?

Math Major Trying to Learn CS

Parallelizing N-Queens

How to test locally hosted websites on mobile?

Question about learning programming

Why do emails from my contact form bounce?

Researchers create first image-recognition software that greatly improves web searches

US Internet giants join effort to curb child sex abuse

Software detects and extracts text from within video frames, makes it searchable

Cracks emerge in the cloud

Google Drive sports new view and scan enhancements

Reading speed harnessed to automatically control text display rates

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Training artificial intelligence with artificial X-rays

New model for large-scale 3-D facial recognition

Medical Xpress

Tech Xplore

Science X

Identifying illegal websites in photos

The Milky Way's eROSITA bubbles are large and distant

Saturday Citations: Armadillos are everywhere; Neanderthals still surprising anthropologists; kids are egalitarian

NASA astronauts will stay at the space station longer for more troubleshooting of Boeing capsule

The beginnings of fashion: Paleolithic eyed needles and the evolution of dress

Analysis of NASA InSight data suggests Mars hit by meteoroids more often than thought

New computational microscopy technique provides more direct route to crisp images

A harmless asteroid will whiz past Earth Saturday. Here's how to spot it

Tiny bright objects discovered at dawn of universe baffle scientists

New method for generating monochromatic light in storage rings

Soft, stretchy electrode simulates touch sensations using electrical signals

Relevant PhysicsForums posts

Related Stories

Researchers create first image-recognition software that greatly improves web searches

US Internet giants join effort to curb child sex abuse

Software detects and extracts text from within video frames, makes it searchable

Cracks emerge in the cloud

Google Drive sports new view and scan enhancements

Reading speed harnessed to automatically control text display rates

Recommended for you

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Training artificial intelligence with artificial X-rays

New model for large-scale 3-D facial recognition

Newsletter sign up

Donate and enjoy an ad-free experience