Sitemap
Photo by Peter Olexa on Unsplash

RAG Analytics Explored

Parametric vs Non Parametric Models

6 min readMay 15, 2024

--

In the world of data, it’s like we’re on a hunt for treasure, with data being the new gold rush. But finding the real, pure gold isn’t easy; there are limits to what we can get our hands on. So, how do we even begin to make sense of it all?

That’s where statistics comes into play. It’s like our trusty guide, helping us navigate through the maze of data. We start by grabbing a chunk of data that interests us, like staking a claim in a gold mine. This chunk gives us a peek into the bigger picture, like finding a nugget that hints at a much larger vein of gold.

But we’re not satisfied with just a glimpse. We’re after the real deal — the truth hiding within the data. So, we use statistics to make educated guesses about the whole batch of data based on what we’ve got.

It’s like guessing how big the entire gold vein might be based on the size of that nugget we found.

As we dig deeper into the data, we come across some assumptions — ideas about how the data behaves and what it looks like. These assumptions guide us as we crunch the numbers and try to make sense of it all.

But through it all, our goal remains the same: to uncover the truth buried beneath the surface of the data. It’s a journey fueled by curiosity and driven by the promise of discovery. And with statistics as our trusty companion, we’re well-equipped to tackle whatever challenges come our way.

What is a parameter?

A parameter represents a characteristic or feature of a system or model that can be quantitatively measured, estimated, or manipulated

  1. Parameters in Functions: Consider a general function f(x;θ), where x is the input variable and θ represents a parameter or a set of parameters that influence the behavior of the function. In this context, θ defines the shape, position, or other characteristics of the function f. For example, in the case of a linear function f(x;θ)=θ1​x+θ0​, the parameters θ1​ and θ0​ determine the slope and intercept of the line, respectively.
  2. Parameters in Statistical Distributions: Statistical distributions are often characterized by parameters that describe their shape, central tendency, and dispersion. For instance, in a normal distribution N(μ,σ2), μ represents the mean (average) of the distribution, and σ2 represents the variance (spread) of the distribution. These parameters dictate the location and spread of the distribution in the real number line.

Parameters can be estimated from data using statistical techniques such as maximum likelihood estimation (MLE) or Bayesian inference. Once estimated, these parameters can be used to make predictions, infer properties of the underlying system, or perform statistical analysis.

Quantities like averages, spreads, and ratios are key values we call “parameters” when we talk about a whole group. But since we usually can’t survey everyone, we don’t know these exact parameters. Instead, we can estimate them using our sample data. These estimated values from our sample are what we call “statistics.” So, think of statistics as estimates of parameters.

Now, parametric methods in statistics make guesses about the shape of the data’s distribution, like assuming it’s normal, and also about its key features, such as averages and spreads. On the flip side, non-parametric methods don’t make many assumptions about the data’s distribution or its features. They’re more flexible that way.

Parametric vs Non Parametric Models

Now, let’s consider examples:

  1. Parametric Model Example: (Linear Regression ) Suppose you have a dataset containing information about house prices (target variable) and features such as square footage, number of bedrooms, and location. Linear regression assumes a linear relationship between these features and the house price, estimating coefficients for each feature to minimize the error between predicted and actual prices.

Linear Regression, Logistic Regression, Generalized Linear Models (GLM), ANOVA (Analysis of Variance), ANCOVA (Analysis of Covariance), Multinomial Logistic Regression, Poisson Regression, Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA),Survival Analysis Models (e.g., Cox Proportional Hazards Model)

  1. Non-parametric Model Example: K-Nearest Neighbors (KNN) In the same housing price prediction task, KNN makes no assumptions about the functional form of the relationship between features and price. Instead, it predicts the price of a new house by averaging the prices of its k nearest neighbors in the feature space. KNN doesn’t estimate parameters; it directly uses the training data for predictions.

K-Nearest Neighbors (KNN), Decision Trees (e.g., CART, Random Forests), Support Vector Machines (SVM), Kernel Density Estimation (KDE), Gaussian Processes, Isotonic Regression, Locally Weighted Scatterplot Smoothing (LOWESS), Rank-Based Tests (e.g., Wilcoxon Rank-Sum Test), Non-Parametric Bootstrap, Nearest Neighbor Regression

Parametric vs Non Parametric Memory

Parametric memory, in the context of machine learning, typically refers to models that store information about the data they have encountered in a fixed set of parameters. These parameters are learned during the training phase and remain constant during inference. Think of it as storing knowledge about the data in a predefined format. For example, in a parametric model like linear regression, the learned coefficients serve as the memory of the model. It’s like having a fixed set of rules to interpret and predict based on the data seen during training.

On the other hand, non-parametric memory operates differently. Instead of storing information in a fixed set of parameters, non-parametric methods retain the entire dataset or a subset of it for making predictions. This means that the memory of the model grows with the size of the dataset. K-nearest neighbors (KNN) is a prime example of a non-parametric model. It doesn’t distill the information into parameters but rather stores the entire training data or a part of it. When a prediction is required, it searches through this stored data to find the most similar instances to the input and makes predictions based on their properties. In essence, non-parametric memory can be seen as the model “remembering” the entire dataset rather than summarizing it into a set of rules.

RAG in the picture:

Recipe for retrieval-augmented generation (RAG): models that fuse pre-trained parametric and non-parametric memory for language generation.

In RAG models, the parametric memory comprises a pre-trained seq2seq model, while the non-parametric memory consists of a dense vector index of Wikipedia, retrievable through a pre-trained neural retriever.

  • The parametric memory is trained end-to-end along with the rest of the model using backpropagation, allowing it to adapt to the specific task through gradient-based optimization.
  • Non-parametric memory refers to external knowledge sources that are not directly modifiable during training but can be accessed during inference.
  • In RAG, non-parametric memory is represented by a large collection of text passages or documents, typically extracted from the web or other corpora.
  • These passages serve as a knowledge base that the model can query to retrieve relevant information during generation.
  • RAG uses an efficient retrieval mechanism (e.g., dense retriever) to search through this non-parametric memory and retrieve relevant passages given a query.
  • The retrieved passages are then combined with the input to augment the generation process, providing the model with additional context and information.

In a Nutshell

  1. Parametric tests are based on assumptions about the distribution of the underlying population from which the sample was taken. The most common parametric assumption is that data are approximately normally distributed.
  2. Nonparametric tests do not rely on assumptions about the shape or parameters of the underlying population distribution.
  3. The parametric assumption of normality is particularly worrisome for small sample sizes (n < 30). Nonparametric tests are often a good option for these data.
  4. Memory: Parametric fixed, non-parametric proportional to dataset size.
  5. Generalization: Parametric concise, non-parametric risk overfitting.
  6. Efficiency: Parametric computationally efficient, non-parametric entails searching.
  7. RAG integrates both parametric memory (learnable parameters in the model) and non-parametric memory (external knowledge sources) to tackle knowledge-intensive NLP tasks.
  8. Parametric Memory:
  • Learnable parameters (weights) within the model.
  • Encodes input data (e.g., queries).
  • Decodes and generates output (e.g., responses).

9. Non-Parametric Memory:

  • External knowledge sources not modifiable during training.
  • Large collection of text passages or documents.
  • Retrieved passages augment generation process.

10. For further reading : Conover, W.J. (1980). Practical Nonparametric Statistics, New York: Wiley & Sons.

11. RAG Paper

--

--

No responses yet