The rush to develop AI is leading to some fascinating applications. Take, for example, natural-language processing (NLP), which is a subfield of AI that deals with the interaction between computers and human language to enable computers to interpret human language and then generate data from that interpretation, according to Inuvo’s data scientist, Melodie Du.
Her company is expanding on the concept of NLP by creating a language-model-based, generative AI capable of identifying the words associated with the audience for any product, service, or brand. “The result is the ability to reach those audiences utilizing our AI systems without requiring any client or third-party data,” she explained.
Charlene Weisler: How does Inuvos’s NLP facilitate consumer intent measurement?
Melodie Du: We convert the entire open web into an interconnected language model of what we call intent signals. Then we assign categories, sentiment, and deduced demographic info to these signals based on a series of interconnected AI systems.
This means our AI breaks down, in real time, any piece of information to its core signals and then aggregates them to determine if it matches against the custom intent models our AI builds for each client’s product, service or brand. This approach allows for a meaningful analysis of all ad impressions even when there isn’t a cookie or an ID-based profile available, including impressions from Safari.
Weisler: You speak of vectorization. What is that?
Du: In NLP, vectorization refers to the process of converting text data into numerical vectors or arrays that can be processed by machine-learning algorithms. Vectorization is a critical step in many NLP tasks, such as text classification, sentiment analysis, and language modeling.
By converting text data into numerical vectors, machine-learning algorithms can more readily identify data patterns and relationships and make predictions or classifications based on that.
Weisler: How does it meet or improve measurement in a privacy-compliant manner?
Du: Without cookies, attribution on a per-source basis is difficult for anyone, especially in channels like display and CTV where the majority of conversions are not associated with clicks, and thus cannot append click IDs. By analyzing the constant variations in spend between traffic and spend sources, our AI is able to probabilistically determine what channels / tactics are driving conversions – totally agnostic of cookies and identifiers.
Weisler: Can you give me an example of how this all works?
Du: When conducting a contextual analysis of a web page, hundreds or thousands of signals may be returned, but most of them are usually not related to the main topic. We use a combination of methods—including frequency-based probabilities, concept-graph weights, taxonomy and vectorization of the concepts—to filter out irrelevant signals and suggest relevant ones that may not be explicitly mentioned in the text.
Weisler: What are the challenges with these methods (NLP and vectorization)?
Du: One of the main challenges of NLP is the ambiguity and complexity of natural language. Natural language is full of nuances, idioms, expressions, and sarcasm, which are difficult for machines to understand. The meaning of a sentence can vary depending on the context, and sometimes even subtle changes in wording can alter its meaning. The challenge of vectorization is to represent complex data in a way that can be efficiently processed by algorithms.
The challenge here is scale. When you’re dealing with high-dimensional data, such as text, the number of features can be in the millions or billions. Based on Inuvo's domain expertise, we devolved our own model to make this procedure less time-consuming while maintaining compactness and precision.
Weisler: How can these methods be best implemented?
Du: Our team has implemented custom pipelines for scaling our AI algorithms to handle big data, using technologies like Spark, Hadoop, and MapReduce, as well as implementing our models in both Java and Python—and we hold several key patents for these components.