How AI brought ≈ to computers

Computers follow exact rules. They understand logic like "Delhi" ≠ "Chennai" or "Population of Delhi" > 30 million. However, they struggle to understand the concept of similarity (≈). For example, to us, "Delhi" ≈ "Noida" because both are part of the National Capital Region, but computers can't grasp that naturally. To make computers as intelligent as humans, we first need to teach them the concept of similarity (≈). How do we do that?

It's all about distance

In the Tamil movie Puthumaipithan, Vadivelu plays the role of a politician who explains to journalists why places near Delhi have more political influence than Chennai. His answer? "It’s distance." He goes on to propose moving Chennai closer to Delhi to gain more influence. While this idea may seem absurd at first, it's actually similar to how AI works. AI uses 'distance' to understand how close or similar different entities are. In this post, I'll show how AI applies this concept to determine similarity between different entities like cities. First, we'll look at how Noida is closer to Delhi, then see how Chennai can be considered closer to Delhi depending on the context.

For a politician, Noida ≈ Delhi

Let's take a politician like Vadivelu as our first example. For a politician, the closer the two cities are, the more similar they are in terms of language, culture, and voting patterns. If you provide a computer with the latitude and longitude of Delhi, Noida, and Chennai, it can calculate the distances between the cities. Based on distance, it can conclude that "Noida" is closer to "Delhi" than "Chennai." Computers excel at calculating exact distances, which helps them determine which cities are similar in this specific context. Computers use machine learning (ML) algorithms to calculate the distance between entities like cities to determine their similarity. We train these algorithms to focus only on the characteristics relevant to a specific use case—such as latitude and longitude for physical distance.

For an entrepreneur, Chennai ≈ Delhi

Now, consider an entrepreneur looking to expand their business. They want to find a city similar to "Delhi." In this scenario, you would provide different characteristics like "tier" (the size of the city) and "population." This data would show that "Chennai" is more similar to "Delhi" than "Noida" because both Chennai and Delhi are tier-1 cities with similar populations. In this case, the 'distance' doesn't refer to physical proximity but rather to how similar they are in characteristics like population and city size. This ability to find similarities using machine learning helps computers perform pattern matching. This is similar to how humans do pattern recognition and represents a foundational step towards achieving human-like intelligence.

Narrow AI

The characteristics such as latitude/longitude provided to an ML algorithm depend on the use case—whether it’s for a politician to compete in an election or an entrepreneur to expand their company. In both cases, calculating similarity requires a lot of data, such as knowing the details of all the cities. If you don't have enough data, the algorithm won't work correctly. This is known as the "cold start problem" in machine learning. This was what we tried to solve at our startup, Guesswork, back in 2013. By focusing on one specific use case—customer information—we could pre-train our ML model and solve the cold start problem. Hence, these models were called narrow AI.

General Purpose AI

The current AI era has given rise to a new breed of models called Large Language Models (LLMs). These models solve the cold start problem differently. They are trained on vast amounts of information available on the internet. They understand thousands of characteristics about each word, such as "Delhi" or "Chennai." Instead of using simple two-dimensional charts, LLMs represent words like "Delhi" and "Chennai" in a multidimensional space, where each dimension captures a specific characteristic of the cities. They can then calculate the distance between words in that space to understand similarities—even without specific training data.

But there is still a challenge: How do you make sure the model focuses on only certain characteristics? For example, if you’re a politician trying to find a similar city for an election, the model should focus on latitude and longitude to calculate the distance. But if you’re an entrepreneur, it should focus on tier and population. How do you do that?

Prompting: With prompting, you instruct the model in plain English to focus on the right characteristics. For example, if a politician asks the model to find a city similar to Delhi for an election, the model might consider that latitude and longitude are important because being close could influence political support. Therefore, it would suggest Noida over Chennai.
Fine-Tuning: In fine-tuning, you train the model with examples. This might look similar to ML models from the past, but it is different. Traditional ML models start with no knowledge. But, fine-tuned models already have knowledge from the internet and can focus on characteristics mentioned in the examples. In our case, it knows about the language and culture of the cities, but it will pay more attention to latitude and longitude because we emphasized them in the examples. In short, traditional ML models need a lot of examples, and prompting can be unpredictable without examples. Fine-tuning balances both and gives predictable results with fewer examples.

But LLMs can do more than just find similar cities—they can also explain why. For example, they can explain why Chennai might be better for starting a business than Noida, all in plain English. This works a bit like Google Translate. In Google Translate, you provide a fact about Delhi and ask it to translate it into another language. With LLMs, you provide a fact about Delhi and ask it to transform that fact for Chennai. This kind of transformation is called generative AI, which is used in tools like ChatGPT.

Summary

Computers follow exact rules, so we call them deterministic. But when computers understand ≈, we call them probabilistic. This is because they calculate the distance between two entities and conclude that they are “probably” similar. As long as a computer can measure how "close" two entities are, it can determine if they are similar, understand their connection, and provide meaningful answers. This simple idea of similarity (≈) was a major stepping stone for computers, enabling them to make decisions that are not just logical but also intuitive—much like humans do. AI has taken that concept and turned it into something magical—whether it's answering our questions or helping us make decisions—reshaping the world as we know it.

Posted by Mani Doraisamy

Dec 2, 2024

Comments (1)