Graph Representation Learning (Stanford university)

Пікірлер: 50

@deterministicalgorithmslab17444 жыл бұрын
NOTES/ Important Points :- 1.) Graphlets are undirected motifs. 2.) The purpose of graph representation learning is to allow automatic feature engineering by embedding each node into d-dimensional space such that similarities in the network(*) correspond to closeness(cosine/Euclidean distance) in the embedding space. 3.) Images==Grid graphs ; Text, audio == Chain graphs . Adjacency matrix representation of a graph is not permutation(of rows/columns) invariant. But the graph is. 4.) *We define this similarity in the network on the basis of random walks. Similarity(u,v) = probability of visiting node v on a random walk starting at node u. Similarity depends on the random walk strategy (R) that you choose to use. 5.) You take fixed size random walks from each node, to make it's neighborhood sets. 6.) You try to maximize the probability of neighborhood sets given the node. Where, probability of node v occurring in neighborhood set of node u(depends on the similarity b/w embeddings of node u and v) is defined as a soft-max that operates over cosine similarities between embedding of node u and embeddings of all other nodes. [ 26:20 ] 7.) We use negative sampling to approximate the soft-max with difficult to compute denominator. 8.) In negative sampling, nodes are sampled randomly in proportion to their degree. 9.) But, random walks from a node, only encode similarities that correspond to "nearness" of two nodes in the network.. so how to encode other types of similarities ? 10.) Node2Vec :- 11.) We can have 2 different types of exploration:- DFS(macro view) like walk. Or BFS(micro view) like walk. 12.) More precisely, 2nd order random walks (i.e., walk has one unit of memory, which tell where you came from to the current node). Also you need to estimate the distances(from the starting node) of all nodes in vicinity of starting node. And then, you choose to move towards, away, or remain at the same distance from starting node in the ratio 1/p : 1/q : 1 respectively. [ 45 :00 ] 13.) Depending on the values of p and q we can get embeddings where communities of nodes have nearby embeddings in embedding space.. or where nodes with similar structural roles are closer together in embedding space. [ 48:02 ] 14.) These embeddings are very consistent. That is, removing/adding ~50% edges leads to only small drop in accuracy of node classification problem. 15.) The node embeddings can be aggregated over various nodes to classify a set of nodes. 16.) It has been observed Node2Vec performs better on node classification task than link prediction. 17.) Must choose definition of node similarity(i.e., random walk(possibly with p,q) neighborhood, normal neighborhood etc.) that matches your application. 18.) EMBEDDING ENTIRE GRAPHS :- 19.) For molecules.. or where lots of small networks are available. 20.) Just calculate normal node embeddings, and sum all of them. 21.) Make a super-node that connects to a sub-graph. The embedding of super-node, is embedding of sub-graph. 22.) :- These are transformations of random walks that are independent of what nodes actually occur on the path ; but try to capture the local structure around a node in the repetitions that occur on the random walk. So, A->B->C is transformed as 1->2->3 . A->B->A is transformed as 1->2->1 , and so on. The number of possible anonymous walks grows exponentially with number of steps in random walk and the size of the set of nodes reachable from a particular node. 23.) Approach 1 :- Graph feature vector = vector having frequencies of different possible anonymous walks. 24.) How many walks needed to get good estimate [ 1:08:10 ] 25.) Approach 2 :- Learn embeddings of all possible anonymous walks(of fixed length) and concatenate/avg them to get embedding of graph. To learn these anonymous walk embeddings, try to predict the probability distribution over next random walk from a node, using anonymous walk embeddings corresponding to the anon. walks corresponding to previous [delta] random walks. As in [ 1:14:02 ]. 26.) Anonymous walks are based on the assumption that all important properties/structures of the graph can be recovered from the statistics of the random anonymous walks that can be undertaken from various nodes.
@dharshanak41183 жыл бұрын
i never knew Slavoj Zizek used to teach Machine learning when he was young
@MrSouravmondal
3 жыл бұрын
:D :D :D
@TheMateyl
3 жыл бұрын
Thicc Slovenian accent
@user-rt6rr8pp9x4 жыл бұрын
Graph is so general, explainable, convenient and elegant. Also it's pretty difficult. : )
@veselindion2
4 жыл бұрын
Pretty difficult indeed :D
@gunarto904 жыл бұрын
Great lecture and Q&A sessions
@linfengyang88344 жыл бұрын
Great lecture, very informative and clear.
@MahmoudOuf4 жыл бұрын
Great lecture, thanks a lot, The article is also so much interesting and infromative.
@dogaarmangil2 жыл бұрын
Great lecture. 37:50 It is a misnomer to call this process "the finding of similarity between nodes" when in fact what you are doing is finding node groups. In a group, nodes are rarely similar to each other: you won't find 2 CEOs in a company, atom nuclei are surrounded by electrons rather than other nuclei, and so on.
@feishi49324 жыл бұрын
Great lecture. It helps me a lot.
@vikankshnath80684 жыл бұрын
He is a such a great explainer
@josephzhu5129 Жыл бұрын
Great lecture, he knows how to explain complicated ideas, thanks a lot!
@jimmorrisshen4 жыл бұрын
This is a great talk.
@tomlarey4 жыл бұрын
Great lecture, props to Jure Leskovec :)
@muhammadaneeqasif572Ай бұрын
amazing great to see some good content again thank yt algorithm keep it up
@yandajiang17446 ай бұрын
Awesome explanation
@arefy1002 жыл бұрын
Great lecture, thanks a lot
@lloydacquayethompson67894 жыл бұрын
This guy is awesome
@christophchristoph99092 жыл бұрын
awesome! thanks for sharing!
@MrSajjadathar4 жыл бұрын
which video you uploaded yesterday
@ClosiusBeg3 жыл бұрын
What if make the number of random walks also random? And simulate n ramdom wolks with constant number of walks over m random constant numbers of walks
@sm_xiii4 жыл бұрын
@Machine Learning TV Can you please give some numbering to the lecture videos? So that we can figure out at which order we should watch those.
@Alkis05
3 жыл бұрын
That is assignment number 3: create a program that go through the lectures and sort them based on the download of the transcript.
@muhammadibrahimnejadian92652 жыл бұрын
Thanks a lot for sharing
@soumyadbanik2 жыл бұрын
What kind of data can have graph representations and what types of data can not be represented as a graph?
@hossein.amirkhani4 жыл бұрын
Great lecture. But just one point: in deepwalk part, he says frequently that he wants to maximize the cost function, but really he wants to minimize it. In addition, his explanation of negative sampling is not completely correct. In fact, he should explain it by mentioning that the multi-class problem is changed to a two-class problem.
@LeenaBora
4 жыл бұрын
Hi Hossein, Yes. It was indeed great lecture. I also didn't understand the part of negative sampling. Can you please elaborate it more?
@ghensao4027
2 жыл бұрын
Exactly. log(probability) is negative infinity to 0, taking negative of this would have a range of 0 to infinity. You want to minimize instead
@deepbayes680824 күн бұрын
32:40 shouldn't this be: log of sum_k instead of sum_k of log?
@ShikhaMallick4 жыл бұрын
Hi, I don't get the intuition behind softmax. Can someone please help me understand?
@muratcan__22
4 жыл бұрын
you can read the wikipedia I think it is well explained there.
@vik24oct1991
3 жыл бұрын
like sigmoid is smooth approximation to sign function or step function , the softmax is smooth approximation for argmax function , like in argmax the output will be 1 for index of input having max value and zero otherwise , in case of softmax the output for index with max value is closer to one but difference is that the second largest or third index will also have non zero based on their values while argmax has one hot output , the softmax has smooth distribution where all index have some value greater than zero based on their inputs but as exponents are used if the max value is far greater than other value , then it basically becomes like the argmax. This is particularly used so that the loss function is differentiable which the argmax is not.
@ufuoma8333 жыл бұрын
Surprisingly, I can follow his discussion. Or maybe I'm just tripping.
@zhenyusha71234 жыл бұрын
where are the left class videos?
@MachineLearningTV
4 жыл бұрын
Watch the new video we uploaded yesterday.
@georgeigwegbe7258
4 жыл бұрын
@@MachineLearningTV Hello @MLTV can we get the whole videos on cs224w?
@0xkellan
4 жыл бұрын
@@georgeigwegbe7258 snap.stanford.edu/class/cs224w-2018/ find "resource" part on this page, you can find lecture notes and videos.
@fatcat47913 жыл бұрын
Is this guy from Switzerland ?
@MCRuCr2 жыл бұрын
Random walking is somehow like the random firing of biological neurons
@thomasfranzstockhammer78462 жыл бұрын
Lg
@distrologic2925 Жыл бұрын
This is about learning *from* graphs but how do we learn the actual graph structures just from raw data? This is the question AI should be answering.
@o_lakisgavalas2 жыл бұрын
misleading title as no (generative) representation is discussed. E.g. "Supervised learning on graphs", or "learning features of graphs" would have been more suitable imho.
@dark_all_day93113 жыл бұрын
Ah nu Cheeki Breeki iv Damke
@pr7493 жыл бұрын
I think I am only here to listen to his accent.
@jensharbers5620
3 жыл бұрын
I do not think That the accent is disturbing someone in any way
@pr749
3 жыл бұрын
@@jensharbers5620 Interesting that you interpreted my comment negatively.
@constantlistener807
2 жыл бұрын
He sounds like young Slavoj Zizek
@charlottjakob29484 жыл бұрын
This is a great talk.
@muhammadibrahimnejadian92652 жыл бұрын
Thanks a lot for sharing