Collecting Website Evidence in Open Source Investigations

6 min readAug 24, 2023


The landscape of modern investigations has undergone a paradigm shift with the advent of the digital era.

At the forefront of this transformation are open-source investigations powered by the unbounded and dynamic nature of the Internet.

Web evidence plays a crucial role within this framework, providing an invaluable resource in various contexts, from cybercrime and digital forensics to legal disputes.

The information on websites can be a valuable source of evidence, often yielding critical insights that can turn the tide in an investigation.

This blog will dive into the world of web evidence, the shapes and forms it can take, legal considerations, best practices, tools, and practical applications.

Types of Website Evidence

Web evidence comes in various types, each with unique characteristics and offering different data layers. This spectrum includes text, images, videos, traffic analytics, and metadata, among other things.

Text content, the most apparent form of website evidence, can offer direct insights and prove claims, given its role in conveying information explicitly.

This type of evidence can range from blogs and articles to comments and forum posts or legal disclaimers, each offering potential evidence depending on the context of the investigation.

Images and videos offer a rich, immersive form of evidence, capable of revealing intricate details and establishing a more comprehensive view of a scenario or event.

The increasingly ubiquitous presence of multimedia content on websites underscores its importance as evidence.

Some estimates suggest it would take 17,810 years of consecutive video watching to go through all of YouTube as of today, assuming all uploads were to stop — which they won’t! (Source: Wyzowl)

Traffic analytics constitutes another critical facet of website evidence. By examining visitor statistics, browsing patterns, time spent on site, traffic flows, and other activity metrics, investigators can infer crucial behavioral patterns and dynamics and quantify particular issues on the web.

Lastly, metadata, the hidden layer of information accompanying every digital content, is often a goldmine of contextual and credibility-establishing information.

It can reveal when, how, and possibly by whom a piece of content was created, crucially bolstering its value as evidence.

Considerations When Gathering Website Evidence

While website evidence is invaluable, its acquisition must be exercised with caution and respect for legal and ethical considerations.

The Internet is not an ungoverned space, and data harvesting activities are subject to privacy laws, regulations, and the terms of service established by its respective websites.

Any violation of these conditions can lead to legal repercussions and might compromise the credibility and admissibility of the evidence gathered. Therefore, a legally compliant approach to gathering website evidence is paramount.

There is no shortage of suppliers willing to sell you data collected in breach of terms of use (Facebook being a prime example), oblivious to the legal harm they expose their subscribers to.

Best Practices for Gathering Website Evidence

Ensuring that the evidence collection process aligns with the highest possible standards is crucial, particularly for cases where strict legal or regulatory bodies might scrutinize the evidence.

The collection process should follow a well-defined, robust, and repeatable methodology that can be validated and endorsed by a trusted third party, such as a forensic expert or a software suite specializing in evidence collection.

The ever-evolving nature of the web also calls for the immediate capture of web pages when found.

Online content is dynamic and constantly in flux, with web pages frequently updated or deleted at a moment’s notice. Thus, preserving a webpage at the point of discovery is vital to prevent the loss of potential evidence.

In addition to capturing the surface content, it can be helpful to gather the underlying source files, such as HTML or CSS, as these files often contain additional data or context critical to an investigation.

HTML website code

Furthermore, safe storage practices will be fundamental to maintaining the integrity and security of the gathered evidence.

Some of these practices might include using encryption (in transit and at rest) and digital signatures to safeguard the evidence from unauthorized access or manipulation and maintaining strict control over who can access what part of the evidence is also crucial and can be facilitated by software that provides Role-Based Access Controls (RBAC) and Conditional Access Controls (CAC) such as AKTEK iO.

The spoliation issue — the intentional or unintentional destruction or alteration of evidence — should not be overlooked.

Investigators should advise their clients against any actions that might lead to spoliation, thereby preserving potential online evidence and maintaining a clear audit trail of any changes in evidence data (what changed, when, and by whom?).

Lastly, ethical considerations should always guide the investigation. The anonymity and vastness of the Internet should not serve as an excuse to breach ethical boundaries.

It is vital to avoid pitfalls such as direct communication with represented parties via deceptive means or misrepresentation through fake profiles and online accounts.

Upholding ethical integrity will always be as crucial to the investigation as the quality of the evidence gathered.


Tools for Gathering Website Evidence

There is a wealth of tools that can streamline and enhance the process of gathering website evidence.

Web scraping tools can efficiently automate the extraction of large volumes of data from websites and search engines, reducing the time and effort required for manual data collection.

Website archiving services can also offer a way to preserve the state of a website at a particular moment in time, which is essential given the fluidity of online content.

These services can capture a webpage snapshot, storing it for future reference and analysis.

Metadata extraction tools can delve into the hidden layers of the website content, revealing the context and details embedded in the metadata.

This information can provide valuable insights and bolster the credibility of the evidence.

Case Study: Monitoring Black Market Gambling in the UK During the World Cup

Some organizations are on the leading edge of web evidence gathering. Yield Sec is one such organization engaged in constant monitoring of online activity relating to gambling and sports betting.

Of particular focus is identifying and tracking unlicensed operators and the oxygen provided by affiliate marketing pages and social influencers.

During the last Football World Cup, its forensic experts were tasked with uncovering the extent of black-market gambling activities proliferating online and targeting vulnerable demographics in the UK.

Football World Cup

Yield Sec analysts leveraged a suite of web scraping tools to extract data and evidence from unlicensed gambling websites, enabling the identification of modus operandi and quantifying the scale of these crimes.

By employing rigorous data capture standards and secure storage methods, they ensured the integrity of the digital evidence gathered.

The entire process was conducted with the utmost regard for ethical considerations and privacy laws, underpinning the credibility of the investigation.

The collected evidence played a significant role in illuminating the extent of these illicit activities, informing policymakers on the effectiveness of current controls and the best way forward to protect the UK market and its citizens.

Source: iGamingBusiness

In summary

In conclusion, the utility of website evidence in open-source investigations is profound. Spanning diverse types such as text, images, videos, traffic analytics, and metadata, website evidence offers a wealth of information that can profoundly impact the course of an investigation.

Nonetheless, the gathering process demands meticulous attention to legal considerations, ethical guidelines, and best practices.

Employing tools like web scraping, archiving services, metadata extractors, and secure tiered access platforms can simplify and enhance this process.

As we tread further into the digital age, the importance and relevance of website evidence in open-source investigations can only be expected to grow.

AKTEK is proud to support research & investigation teams looking into some of the most complex issues of our time.

AKTEK iO provides a comprehensive suite of tools for gathering website evidence, spanning search engine results, app store results, web traffic data, and precise traffic flows in and out of web pages. Find out more by getting in touch with one of our experts!

Originally published at




We empower individuals & organizations addressing complex global challenges with the tools and intelligence they need to safeguard people, assets & operations.