Thompson Sampling : Data Science Concepts
The coolest Multi-Armed Bandit solution!
Multi-Armed Bandit Intro : • Multi-Armed Bandit : D...
Table of Conjugate Priors:
en.m.wikipedia.org/wiki/Conju...
My Patreon : www.patreon.com/user?u=49277905
The coolest Multi-Armed Bandit solution!
Multi-Armed Bandit Intro : • Multi-Armed Bandit : D...
Table of Conjugate Priors:
en.m.wikipedia.org/wiki/Conju...
My Patreon : www.patreon.com/user?u=49277905
Пікірлер: 47
Man what a good explanation! I was looking for bayesian regression and found your video on it, got it. Now I searched for thompson sampling and it's your channel again! You're saving my day hahaha. Very clear and insightful explanations. Thank you very much!
Your explanation made me say WOW!
I really like your videos! Your explanations are so much better than the ones given by my professors!
@ritvikmath
2 жыл бұрын
Thanks!
Very neat, first time I come across Thompson’s sampling!
Very well explained video, helped me a lot!
Very clear explanation! Thank you so much!
I like the very clear explanation with a reference to the math details for those who want that. Also appreciate the limitations at the end. Thinking of applications to portfolio optimization
@ritvikmath
3 жыл бұрын
Thanks!
This is pretty awesome, thanks for the great explaination!
@ritvikmath
3 жыл бұрын
Thanks!
Excellent video - thank you!
Beautiful explanation! Had come across Thomson Sampling during Udemy 's online course on Recommender Systems.
Cool video! There are a lot of videos about DS implementation, I find this channel provides lots of math foundations behind the scene. While a good implementation is important, I believe the theoretical foundation is also very cool and would be crucial to a successful analysis.
@ritvikmath
3 жыл бұрын
Thanks!
great explanation!!!
Damn bro. You are good at this
loved the explanation I thought before this video I could never learn TS Thank you :)
This is really interesting. Never heard of it.
Amazing. Thanks 😉
The posteriors that emerge given the formulas have a standard deviation of 1 after one visit. Does this result depend on the fact that the quality of the restaurants actually have a known standard deviation of 1?
Can you produce a video to explain about moving least squares method? Thank you in advance
Awesome!!!
how to pick the next visit to 1 or 2? could you explain that?
Could you please also have a video on Importance Sampling?
Absolutely fantastic content once again, many thanks! However, I would have one important question: you never revisited the assumption that we know sigma,i beforehand, even though in practice it's an unobservable quantity. What should one do with it? Is estimating it from historical data (if such data are available) a big no-no?
Can you please mention the source u studied for this video? Like a journal paper or a textbook u followed. It will help me a lot. Thanks
how do you get the initial posterior distribution of 20 and -12?
You are the best
What about CRF? r u able to do it?
Is one shortcoming of this method that the variance of the posterior does not scale to the sample variance of the observations for that restaurant? Like, if I went to Restaurant A 50 times and Restaurant B 50 times, and my sample values from Restaurant A were distributed N(5, 1) but my sample values from Restaurant B were distributed N(6, 10), then you would think that my posterior for Restaurant B should have much wider variance than my posterior for Restaurant A. But Thompson Sampling doesn't seem to account for that, instead just scaling posterior variance by the number of observations per restaurant. Am I missing something here?
Is it correct to multiply by sigma squared in the posterior formula? Seems that we need to multiply by sigma only otherwise we can get wrong scale. We then get squared length instead of length
I wonder if anybody can bring some "Explore-Exploit" thinking to this. Here, Thompson sampling arrives at the optimal solution provided that the 'environment' (restaurant quality) is constant. But what about a changing environment (say, restaurants occasionally going under new management). In this case, it seems that time exploring should always remain higher than it would in a constant environment. Is there an analagous sampling routine for such situation?
@nickmillican22
3 жыл бұрын
Been thinking about this. I may have a partial solution. Since, once sufficient data is available, the 'better' option might always out compete the 'lesser' option, a change to the environment that makes the lesser option the better will go undetected. So perhaps the goal is to increase the uncertainty in the posteriors in proportion to the number of future events. One way (I think) to do this would be to weight the data by something like [1/total planned visits to any restaurant]. In this way, much of the 'uninformation' of the prior is maintained--permitting increased exploration. But even if this is okay, what do you do if you plan to visit restaurants infinitely many times?
Do you have the article you mentioned (4:42) witht the table of prior/posterior distributions?
@constantin1481
3 жыл бұрын
He was possibly referring to this paper: statweb.stanford.edu/~cgates/PERSI/papers/conjprior.pdf
@ritvikmath
3 жыл бұрын
Just linked! Sorry bout that
@charlessimmons3709
3 жыл бұрын
@@ritvikmath Thanks!
Actually you had the board covered the entire video. Couldn’t take a photo unobstructed this time.
@ritvikmath
3 жыл бұрын
Sorry! Will try to remember that
Do you know Top-two thompson sampling?
Ritvik you are cool
@ritvikmath
3 жыл бұрын
Wow thanks!
8:47 "we sample from those posteriors" You mean "priors"?
@TonkatsuChickenTJ
2 жыл бұрын
in the first visit the posterior is equal to the prior
You can use this Python Code: # Thompson Sampling # Importing the libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd # Importing the dataset dataset = pd.read_csv('Ads_CTR_Optimisation.csv') # Implementing Thompson Sampling import random N = 1000 d = 10 ads_selected = [] numbers_of_selections=np.zeros(10) # Ni(n) variance_posterior = [1e2]*d mean_posterior=[0]*d sum_sample=[0]*d numbers_of_rewards_0 = [0] * d total_reward = 0 for n in range(0,N): ad=0 max_sample=0 for i in range(0,d): sample=np.random.normal(mean_posterior[i], variance_posterior[i]) if sample>max_sample: max_sample=sample ad=i ads_selected.append(ad) numbers_of_selections[ad]+=1 reward=dataset.values[n,ad] sum_sample[ad]+=reward variance_posterior[ad]=1/(1/(1e2*1e2) + numbers_of_selections[ad]) mean_posterior[ad]=variance_posterior[ad]*sum_sample[ad] # Visualising the results - Histogram plt.hist(ads_selected) plt.title('Histogram of ads selections') plt.xlabel('Ads') plt.ylabel('Number of times each ad was selected') plt.show() print(numbers_of_selections[4]/sum(numbers_of_selections)*100)