AMPLYFI InsightsCompany blog

Data discovery: How do we process the ocean of information on the web for better decision making?

By June 8, 2022 No Comments
Gordon Fraser, Engagement Manager

Gordon, AMPLYFI’s Engagement Manager, is responsible for building and nurturing relationships with key clients. He has spent the past 10 years in the disruptive technology space, providing a global customer base with the tools necessary to succeed in today’s competitive environment.

We are experiencing a period of rapid data growth. In 2015, the amount of information available on the web hit 1 Zettabyte (ZB), in 2020 this number grew to 40ZB, and according to Cisco’s Global Web Traffic Report, this number is set to reach 174ZB by 2025. To put this into perspective, if you attempted to download 174ZB it would take over 400 million years with the average global broadband speeds currently available. 

It is also predicted that there will be 5.3 billion total internet users by 2023, up from 3.9 billion in 2018. That’s a staggering 66 percent of the population online, continuously adding data to multiple endpoints. On top of this, with the advancement of technology like Elon Musk’s Starlink, people in even the most remote locations will be connected, further increasing these figures. It is safe to say that the amount of data available on the internet is vast, unstructured and near immeasurable, and will increase in abundance in the foreseeable future.

Currently the majority of the focus is on the surface web, which accounts for less than 5% of the total information available on the internet. This data is already indexed by standard search engines like Google, Yahoo! and Bing which is why finding and accessing this information is relatively easy. But what about the deep web? The deep web consists of all non-indexed unstructured data primarily found behind paywalls, logins and authentication forms. This data is extremely difficult to structure and analyse, and thus if a business had the ability to do just that, it would ultimately provide them with a competitive advantage.

There are platforms on the market that successfully access and ingest this information at scale, but few of them provide reliable insight extraction models that can be used for market intelligence and clinical decision-making. 

So how do we access this unstructured, deep web information in order to empower our businesses to make better, faster decisions? First, let’s explore the difference between structured and unstructured data.

Structured vs Unstructured Data

Not all data is created equal, and is usually classified as structured or unstructured. This classification has an impact on how data is collected, processed and presented. 

Structured data is found in models that are pre-defined i.e. templates or spreadsheets, and is highly organised and easily analysed. An example of this would be point-of-sale information structured into rows of purchase transactions in an Excel spreadsheet. 

On the other hand, unstructured data is usually text-heavy and quite difficult to analyse. Social media posts containing ideas and opinions are a good example of this. Although elements like hashtags do structure some of the content, the majority remains vastly unstructured. Other examples of unstructured data include intranets, company wikis, email folders, network drives, academic journals, literature and archives to name a few.

Whilst there is merit to analysing structured data, it is readily available to most companies through an array of applications, therefore it is not going to give you the competitive edge. If you had the capability to extract insights from an untapped resource like unstructured data however, it would result in various benefits including, but not limited to, reduced time and effort spent on analysis, higher quality insights, driving innovation, and enabling better decision making and foresight.

Why Focus on Unstructured Data?

It is estimated that 80 to 90 percent of all data collected by companies is unstructured, and is growing faster than structured data at roughly 55% – 65% per year. But, it is also extremely complex to analyse. If it could be accurately analysed and at a rapid rate, it would provide businesses with a distinct competitive advantage. There are, however, limited platforms available in today’s market that can readily access and analyse this type of data at scale. 

So, why is unstructured data important? The simple answer is that intelligence is hidden in the millions of documents or presentations which are difficult to access and even harder to read and analyse in large volumes. Internal proprietary data in large organisations is often siloed and inaccessible, often leading to mass duplication as employees are unaware of what exists on their network. Picture a sales team’s folder in a company employing 1,000 Sales Development Managers (SDRs). If 100 of those people leave next month, how do sales managers know what potential opportunities lie within the hundreds of documents those employees produced? Unless these documents are searchable and easily accessible at scale, the data will likely lie dormant for decades leading to missed opportunities. 

Similarly, external public data like news sources publish enormous amounts of content. The Associated Press publishes over 1,200 articles per day, which is more content than an analyst could hope to read in a week from a single source. If we wanted to analyse 1,000 news sources, say, the time it would take for humans to find, read and analyse would grow to many  lifetimes – clearly impossible. Thus the opportunity is there for organisations to leverage machine analysis to read and analyse hundreds of thousands of documents per day, in fact, a lifetime of human reading roughly every 20 minutes, to draw insights for optimal and effective decision making.

Data Analysis Tools

In a 2019 Deloitte survey, less than 37 percent of executives in the sample believe their companies are insights-driven and the remaining 63 percent are aware of this data but do not have the infrastructure to harness its potential. Below we discuss three machine-driven technologies that can be utilised by organisations to overcome this barrier.

Natural Language Processing (NLP)

Natural Language Processing, or NLP for short, is a component of Artificial Intelligence that involves machine understanding and interpretation of human language in an insightful way. By utilising NLP, one is able to structure data and produce useful insights through the use of models such as translation, sentiment analysis, entity recognition, topic modelling and segmentation, and relationship analysis. 

The above figure depicts how AI and machine learning can be used to understand language at a sentence level. The machine identifies which entity is the subject (Ford) and which sectors and topics it relates to (automotive and carbon emissions respectively). It is also able to analyse the polarity of the connection between them, indicating the presence of  positive or negative sentiment linked to the entity in question. 

Further complexity exists when the machine needs to disambiguate words in order to understand the true meaning of a sentence. An example of this is being able to differentiate the distinct sense of the word “bank”:

  • Barclays Bank.
  • The bank of a river.

The first sentence refers to a commercial bank, and the second to a river bank. Detection of ambiguity is the first hurdle, while resolving it and displaying the correct output is the second. The machine will automatically figure out the intended meaning of an ambiguous word when found in a specific sentence, increasing the accuracy of the output produced

Another popular use case for NLP is in tracking brand sentiment – in other words, the negative, positive or neutral tone towards a brand in which content is written. Many companies track thousands of user comments to derive a total sentiment score towards a specific subject. This helps organisations better understand their audience and equips them to make more informed responses. A seamless user interface which employs AI to do so is thus key to providing this information in an easily accessible manner.

Enterprise Search Engines

The International Data Corporation (IDC) data shows that the average knowledge worker spends about 2.5 hours per day, or roughly 30% of the workday searching for information. Most of the time they’re unable to find the information they need to do their job effectively, leading to inefficiencies across the organisation. In a 2018 Nintex Survey on 1,000 U.S. employees across various functions and departments, it was found that 62% highlighted broken IT processes as a leading issue hampering digital transformation, and 49% said they had trouble locating documents. By simplifying IT processes through the use of enterprise search it can enhance employee productivity and performance leading to reduced costs and increased efficiency. But first, let’s differentiate between Web (external) and Enterprise (internal) search platforms.

Standard search engines like Google and Yahoo! have become so entrenched in our everyday lives that we don’t think twice before using their platforms to find information. But although there are foundational similarities between web and enterprise search, their function is quite distinct. The term “enterprise search” describes the software used to find information within an organisation, which allows authorised employees to search multiple applications in a single user interface. 

The information contained within an organisation can be fragmented and dispersed to all corners of its internal network. Thus the purpose of enterprise search is to centralise and index in order to de-silo information making it easily accessible to all.

This allows organisations to accelerate workflows through the correct provision of information, ensure results are of the highest quality, and save employees time searching ultimately leading to greater efficiencies and reduced costs.

Monitoring and Alerting

Once an organisation has gathered data from the structured and unstructured sources, the process of analysing it begins. But what about the next week, month or year when market or consumer behaviour has changed? How will you know that your data is still valid? These are complex challenges that require monitoring and alerting mechanisms to ensure changes to data are tracked on a continuous basis. The advantages are clear: by tracking data an organisation is equipped to monitor early warning signals to mitigate disruptions to business processes, uncover risks linked to specific forms of data, and unearth opportunities in new data that ultimately feed the direction of your business. 

With that in mind, it is important to use technology such as NLP and enterprise search in conjunction with monitoring and alerting models to ensure users stay ahead of new information collected from current sets of structured and unstructured data. 

Analysts also cannot afford to trawl large data sets for the slim chance that a valuable piece of information may reside there. Instead they must spend their time analysing data because there is an insight or a change in the data that is meaningful. Alerts thus engage the analyst when there is something they need to review, conversely the absence of a daily alert allows the analyst piece of mind that nothing of value has been picked up, saving them time needlessly reading and analysing. It can be argued that monitoring and alerting may be the biggest enabler of efficiency and productivity, and ensures you don’t miss pivotal information. It makes the process systematic rather than opportunistic.

Monitoring and alerting models such as those provided by AMPLYFI constantly crawl pre-defined data lakes for new information, or look for changes to specific pieces of information. An example would be regulatory monitoring; the platform is capable of monitoring thousands of URLs or documents containing regulatory information, and will alert the user to thematic changes or increased chatter around a specific subject, ultimately empowering companies to stay abreast of requirements in specific regions. The model therefore applies to any organisation that wants to constantly monitor changes to other avenues such as markets, news, regulation, companies, risks, opportunities, legislation, and internal documents to name a few. If it’s accessible by the platform, it can be monitored. 

As can be seen in the above figure, monitoring has been set up for a financial entity, in this case JP Morgan Chase, with an alert set up to track chatter on a key phrase related to adverse risks. One of the main contributors to the 2008 financial crisis was the possibility that the issuer of a bond would not be able to repay the underlying principal or make scheduled interest payments, otherwise known as default risk. Using AI we will eventually be able to monitor early warning signals of adverse events as a preventative measure, empowering organisations with the best tools for growth.


Make better decisions, faster

The problem many businesses face today is that they are drowning in data, and conventional tools used to analyse it fall short each time. According to Gartner, 87 percent of organisations have low business intelligence and analytics maturity. This is a  large barrier to overcome for organisations that want to enhance the value of their data through AI technology. Gartner’s IT score for data analytics classifies organisations with low maturity into “Basic” or “Opportunistic”  categories. Organisations with a “Basic” score have intelligence capabilities that are predominantly spreadsheet-based, which for modern businesses is not good enough to remain competitive. 

The problem we see here is that far too many companies (87 percent according to Gartner’s score) currently rely on human data harvesting and classification methods. Although we may never rely solely on AI to make decisions, relying on humans conducting manual processes alone is not optimal,  it is error-prone and subject to cognitive bias, slow analysis and siloed workflows. We are unfortunately capped when it comes to time and accuracy, but with access to machine-driven tools we are able to increase scale, efficiency and reduce errors and noise for more accurate results that help drive our businesses in a positive, more profitable direction. It is unlikely that machines will ever replace human analysts, but rather equip them to focus their efforts on information that matters leading to better productivity.

AMPLYFI has developed a market-leading platform which helps companies find automated insights from millions of datasets in a matter of seconds. Bridging the gap between human and machine, it enables organisations to spend more time making decisions that matter rather than searching for information, ultimately giving the edge against competitors through better foresight into the market.



Used by: Companies that want to make faster, more accurate and more consistent decisions to mitigate risks, identify opportunities and ultimately boost revenue. 

AMPLYFI is a tech scaleup company using AI and NLP to analyse unstructured data. Our platforms are connected to thousands of sources that extract and analyse millions of documents from the surface and deep web in a matter of clicks. We are transforming decision making through our suite of data optimisation SaaS products. AMPLYFI’s reports incorporate unstructured data analysis techniques such as topic, sentiment, risk, entity, and geographic analysis to help organisations make decisions aligned with factual market-based data.

An overview of AMPLYFI’s model can be seen in the figure below. Data is pulled from a vast array of sources and fed into our platform where it is structured and analysed by the machine reading process. We intuitively connect to a variety of predefined sources relevant to the customer’s key topics of interest, but also provide our customers the option to add unique sources ensuring a dynamic process that produces unique insights.

Some applications of our tools are:

  • Internal proprietary data search, connecting information spread across company networks.
  • Trend analysis reading patent and news documents.
  • Gaining niche insights into consumer behaviour and trends.
  • Analysing communications for regulatory compliance. 
  • Detecting and monitoring early warning signals for asset managers. 

We have also incorporated enterprise search as a resource within our current platform, meaning that an organisation can connect to both web and enterprise-level information providing a turnkey solution to effective and efficient data gathering. There is an opportunity for companies to utilise this technology as a resource to save time, increase employee efficiency, decrease costs, and ultimately surpass competitors for increased market share. 

Final Word

It is certain that global data will continue to grow exponentially in the near future. How organisations adapt to harness insights that lie deep within this data will ultimately dictate how quickly they make decisions to ensure the best direction for their business. 

The sheer scale and nature of data can be overwhelming for any organisation, but being equipped to analyse it and apply the learnings to your business is key. Whether you are a seasoned analyst or a small business owner, knowledge can be harnessed to help you make the best possible decisions for the success of your business. 

At AMPLYFI, we are building a world where anyone in an organisation can leverage information to make measurably better decisions, and change with conviction. Let’s have a conversation, talk to one of our consultants today or start a Free Trial to discover the potential for yourself.