• CS224W Colab_1 笔记


    目录

    1 Graph Basic

    1.1 networkx实现空手道俱乐部Graph的可视化。

    1.2 利用network计算Graph的平均度并取整(Question 1)

    1.3 平均聚类系数计算,并保留2位小数,可以使用networkx自带的聚类系数函数(Question 2)

    1.4 计算id=0节点经过一次迭代的PageRank值(Question 3)

     1.5 计算(raw) closeness centrality (Question 4)

    2 Graph To Tensor

    2.1 Tensor矩阵的初始化(全1,全0,随机值)及元素类型转换

    2.2 将networkx中Graph的边转化为dict类型,每条边在dict中用一个而2个元素的tuple表示。然后,再将dict转换为torch.LongTensor类型(Question 5)。

    2.3 实现一个负样本边生成函数,功能是生成原图中没有的一些边(需要注意无向图正负边为同一条边,自环不算边等情况)(Question 6)

    3 Node Embedding Learning

    3.1 torch.nn.Embedding类的使用

    3.2 nn.Embedding实例emb的可视化

    3.3 torch.nn.Embedding类的实例emb的训练过程(Question 7)


    1 Graph Basic

    主要涉及networkx包的使用:

    1.1 networkx实现空手道俱乐部Graph的可视化。

    1. # Visualize the graph
    2. nx.draw(G, with_labels = True)
    3. pylab.show() # tips 2:draw画图不显示问题可尝试导入pylab包

    1.2 利用network计算Graph的平均度并取整(Question 1)

    1. # Question 1
    2. def average_degree(num_edges, num_nodes):
    3. # TODO: Implement this function that takes number of edges
    4. # and number of nodes, and returns the average node degree of
    5. # the graph. Round the result to nearest integer (for example
    6. # 3.3 will be rounded to 3 and 3.7 will be rounded to 4)
    7. avg_degree = 0
    8. ############# Your code here ############
    9. avg_degree = round(num_edges * 2 / num_nodes)
    10. #########################################
    11. return avg_degree
    12. num_edges = G.number_of_edges()
    13. num_nodes = G.number_of_nodes()
    14. avg_degree = average_degree(num_edges, num_nodes)
    15. print("Average degree of karate club network is {}".format(avg_degree))

    1.3 平均聚类系数计算,并保留2位小数,可以使用networkx自带的聚类系数函数(Question 2)

    1. # Question 2
    2. def average_clustering_coefficient(G):
    3. # TODO: Implement this function that takes a nx.Graph
    4. # and returns the average clustering coefficient. Round
    5. # the result to 2 decimal places (for example 3.333 will
    6. # be rounded to 3.33 and 3.7571 will be rounded to 3.76)
    7. avg_cluster_coef = 0
    8. ############# Your code here ############
    9. ## Note:
    10. ## 1: Please use the appropriate NetworkX clustering function
    11. avg_cluster_coef = round(nx.average_clustering(G), 2)
    12. #########################################
    13. return avg_cluster_coef
    14. avg_cluster_coef = average_clustering_coefficient(G)
    15. print("Average clustering coefficient of karate club network is {}".format(avg_cluster_coef))

    1.4 计算id=0节点经过一次迭代的PageRank值(Question 3)

    提示:节点邻居的遍历可以用networkx中自带函数G.neighbors(node_id)来实现,其返回值是一个dict迭代器(dict_keyiterator)。

    1. # Question 3
    2. def one_iter_pagerank(G, beta, r0, node_id):
    3. # TODO: Implement this function that takes a nx.Graph, beta, r0 and node id.
    4. # The return value r1 is one interation PageRank value for the input node.
    5. # Please round r1 to 2 decimal places.
    6. r1 = 0
    7. ############# Your code here ############
    8. ## Note:
    9. ## 1: You should not use nx.pagerank
    10. '''
    11. tips 3: nx.neighbors(self, n_id)
    12. '''
    13. tmp = G.neighbors(node_id) # tmp: Debug
    14. for nb in G.neighbors(node_id):
    15. r1 += 1.0 * r0 / G.degree(nb)
    16. r1 = r1 * beta + (1.0 - beta) / G.number_of_nodes()
    17. r1 = round(r1, 2)
    18. #########################################
    19. return r1
    20. beta = 0.8
    21. r0 = 1 / G.number_of_nodes()
    22. node = 0
    23. r1 = one_iter_pagerank(G, beta, r0, node)
    24. print("The PageRank value for node 0 after one iteration is {}".format(r1))

    1.5 计算(raw) closeness centrality (Question 4)

    PS:可以使用networkx自带的closeness计算函数,但需注意自带函数计算出的值是归一化后的值,即乘以了(节点数-1)。这里的结果需要除以(节点数-1)以消除归一化。

    1. # Question 4
    2. def closeness_centrality(G, node=5):
    3. # TODO: Implement the function that calculates closeness centrality
    4. # for a node in karate club network. G is the input karate club
    5. # network and node is the node id in the graph. Please round the
    6. # closeness centrality result to 2 decimal places.
    7. closeness = 0
    8. ## Note:
    9. ## 1: You can use networkx closeness centrality function.
    10. ## 2: Notice that networkx closeness centrality returns the normalized
    11. ## closeness directly, which is different from the raw (unnormalized)
    12. ## one that we learned in the lecture.
    13. #########################################
    14. closeness = nx.closeness_centrality(G, node) / (G.number_of_nodes() - 1)
    15. closeness = round(closeness, 2)
    16. return closeness
    17. node = 5
    18. closeness = closeness_centrality(G, node=node)
    19. print("The node 5 has closeness centrality {}".format(closeness))

    2 Graph To Tensor

    Tensor,一种数据类型

    2.1 Tensor矩阵的初始化(全1,全0,随机值)及元素类型转换

    1. import torch
    2. # Generate 3 x 4 tensor with all ones
    3. ones = torch.ones(3, 4)
    4. print(ones)
    5. # Generate 3 x 4 tensor with all zeros
    6. zeros = torch.zeros(3, 4)
    7. print(zeros)
    8. # Generate 3 x 4 tensor with random values on the interval [0, 1)
    9. random_tensor = torch.rand(3, 4)
    10. print(random_tensor)
    11. # Get the shape of the tensor
    12. print(ones.shape)
    13. '''
    14. ones.shape作用等同于ones.size(),其返回一个torch.Size类,shape[0]和shape[1]可获得行数、列数。
    15. '''
    16. # Create a 3 x 4 tensor with all 32-bit floating point zeros
    17. zeros = torch.zeros(3, 4, dtype=torch.float32)
    18. print(zeros.dtype)
    19. # Change the tensor dtype to 64-bit integer
    20. zeros = zeros.type(torch.long)
    21. print(zeros.dtype) #输出zeros张量内部元素的类型。 int64

    2.2 将networkx中Graph的边转化为dict类型,每条边在dict中用一个而2个元素的tuple表示。然后,再将dict转换为torch.LongTensor类型(Question 5)。

    ①networkx中G的边转换为dict集合,可以用G.edges()函数,其返回值类型为,该类型(集合)中元素(即每条边)类型为tuple,详见下面实现。

    ②dict转tensor可以直接用tensor的初始化函数。

    PS:题目要求最后转换得到的tensor的shape为2*节点数,所以需要用到tensor.t()函数来实现矩阵的转置。

    1. def graph_to_edge_list(G):
    2. # TODO: Implement the function that returns the edge list of
    3. # an nx.Graph. The returned edge_list should be a list of tuples
    4. # where each tuple is a tuple representing an edge connected
    5. # by two nodes.
    6. edge_list = []
    7. ############# Your code here ############
    8. for e in G.edges():
    9. edge_list.append(e)
    10. #########################################
    11. return edge_list
    12. def edge_list_to_tensor(edge_list):
    13. # TODO: Implement the function that transforms the edge_list to
    14. # tensor. The input edge_list is a list of tuples and the resulting
    15. # tensor should have the shape [2 x len(edge_list)].
    16. edge_index = torch.tensor([])
    17. ############# Your code here ############
    18. edge_index = torch.LongTensor(edge_list).t()
    19. #########################################
    20. return edge_index
    21. pos_edge_list = graph_to_edge_list(G)
    22. pos_edge_index = edge_list_to_tensor(pos_edge_list)
    23. print("The pos_edge_index tensor has shape {}".format(pos_edge_index.shape))
    24. print("The pos_edge_index tensor has sum value {}".format(torch.sum(pos_edge_index)))

    2.3 实现一个负样本边生成函数,功能是生成原图中没有的一些边(需要注意无向图正负边为同一条边,自环不算边等情况)(Question 6)

    函数:random.randint(0, N)可以生成[0,N]的随机整数。
    1. # Question 6
    2. import random
    3. def myjudge1(G, p, q):
    4. if (p != q and G.has_node(p) and G.has_node(q) and not G.has_edge(p, q) and not G.has_edge(q, p)):
    5. return True
    6. return False
    7. def sample_negative_edges(G, num_neg_samples):
    8. # TODO: Implement the function that returns a list of negative edges.
    9. # The number of sampled negative edges is num_neg_samples. You do not
    10. # need to consider the corner case when the number of possible negative edges
    11. # is less than num_neg_samples. It should be ok as long as your implementation
    12. # works on the karate club network. In this implementation, self loops should
    13. # not be considered as either a positive or negative edge. Also, notice that
    14. # the karate club network is an undirected graph, if (0, 1) is a positive
    15. # edge, do you think (1, 0) can be a negative one?
    16. neg_edge_list = []
    17. ############# Your code here ############
    18. cas = 0
    19. while(cas < num_neg_samples):
    20. p = random.randint(0, G.number_of_nodes() - 1)
    21. q = random.randint(0, G.number_of_nodes() - 1)
    22. if myjudge1(G, p, q):
    23. neg_edge_list.append((p, q))
    24. cas += 1
    25. #########################################
    26. return neg_edge_list
    27. # Sample 78 negative edges
    28. neg_edge_list = sample_negative_edges(G, len(pos_edge_list))
    29. # Transform the negative edge list to tensor
    30. neg_edge_index = edge_list_to_tensor(neg_edge_list)
    31. print("The neg_edge_index tensor has shape {}".format(neg_edge_index.shape))
    32. # Which of following edges can be negative ones?
    33. edge_1 = (7, 1)
    34. edge_2 = (1, 33)
    35. edge_3 = (33, 22)
    36. edge_4 = (0, 4)
    37. edge_5 = (4, 2)
    38. ############# Your code here ############
    39. ## Note:
    40. ## 1: For each of the 5 edges, print whether it can be negative edge
    41. for a, b in [(7, 1), (1, 33), edge_3, (0, 4), (4, 2)]:
    42. print(myjudge1(G, a, b))

    其他更好的代码实现版本:利用networkx自带函数nx.non_edges(G)得到G中所有不存在的边,然后利用random.sample(range(0,n), numb)函数从前边得到的边中随机选择numb个。

    代码:

    1. def sample_negative_edges(G, num_neg_samples):
    2. #题目要求:不用考虑num_neg_samples比所有不存在边的数量还高的边界条件
    3. #不考虑自环
    4. #注意,本来需要考虑逆边的问题,但是由于nx.non_edges函数不会出现两次重复节点对,所以不用考虑这个问题。
    5. neg_edge_list = []
    6. #得到图中所有不存在的边(这个函数只会返回一侧,不会出现逆边)
    7. non_edges_one_side=list(enumerate(nx.non_edges(G)))
    8. neg_edge_list_indices=random.sample(range(0,len(non_edges_one_side)),num_neg_samples)
    9. #取样num_neg_samples长度的索引
    10. for i in neg_edge_list_indices:
    11. neg_edge_list.append(non_edges_one_side[i][1])
    12. return neg_edge_list
    13. # Sample 78 negative edges
    14. neg_edge_list = sample_negative_edges(G, len(pos_edge_list))
    15. # Transform the negative edge list to tensor
    16. neg_edge_index = edge_list_to_tensor(neg_edge_list)

    3 Node Embedding Learning

    3.1 torch.nn.Embedding类的使用

    PS:Embedding相当于生成了一个N*M的嵌入表,G中的每个节点在表中都有一个对应的M维嵌入。

    ①torch.nn.Embedding初始化方法:

    1. # Initialize an embedding layer
    2. # Suppose we want to have embedding for 4 items (e.g., nodes)
    3. # Each item is represented with 8 dimensional vector
    4. emb_sample = nn.Embedding(num_embeddings=4, embedding_dim=8) #初始化
    5. print('Sample embedding layer: {}'.format(emb_sample))
    6. '''
    7. >>> print('Sample embedding layer: {}'.format(emb_sample))
    8. Sample embedding layer: Embedding(4, 8)
    9. '''

    torch.nn.Embedding的使用:

    索引id需要是torch.LongTensor类型的矩阵,可以一维可以二维,具体维度根据最后需要的矩阵shape来定。(id矩阵是nn.Embdding矩阵和目标矩阵之间的过渡量)

    1. # Select an embedding in emb_sample
    2. id = torch.LongTensor([1])
    3. print(emb_sample(id))
    4. # Select multiple embeddings
    5. ids = torch.LongTensor([1, 3])
    6. print(emb_sample(ids))
    7. # Get the shape of the embedding weight matrix
    8. shape = emb_sample.weight.data.shape
    9. print(shape)
    10. # Overwrite the weight to tensor with all ones
    11. emb_sample.weight.data = torch.ones(shape)
    12. # Let's check if the emb is indeed initilized
    13. ids = torch.LongTensor([0, 3])
    14. print(emb_sample(ids))

    torch.nn.Embedding值的修改:

    1. # Overwrite the weight to tensor with all ones
    2. emb_sample.weight.data = torch.ones(shape)

    实现一个nn.Embedding生成函数,要求其各个元素值服从[0, 1]的均匀分布,可以使用torch.rand(a, b)来生成一个服从[0, 1]均匀分布的a*b的矩阵。

    PS:网上有人说实际上nn.Embedding初始化的时候的值就是[0,1]的,这里再修改一次的目的是为了协同torch.manual_seed()函数来保证结果的可复现性。

    1. # Please do not change / reset the random seed
    2. torch.manual_seed(1)
    3. def create_node_emb(num_node=34, embedding_dim=16):
    4. # TODO: Implement this function that will create the node embedding matrix.
    5. # A torch.nn.Embedding layer will be returned. You do not need to change
    6. # the values of num_node and embedding_dim. The weight matrix of returned
    7. # layer should be initialized under uniform distribution.
    8. emb = None
    9. ############# Your code here ############
    10. emb = torch.nn.Embedding(num_node, embedding_dim)
    11. emb.weight.data = torch.rand(num_node, embedding_dim)
    12. #########################################
    13. return emb
    14. emb = create_node_emb()
    15. ids = torch.LongTensor([0, 3])
    16. # Print the embedding layer
    17. print("Embedding: {}".format(emb))
    18. # An example that gets the embeddings for node 0 and 3
    19. print(emb(ids))

    3.2 nn.Embedding实例emb的可视化

    1. def visualize_emb(emb):
    2. X = emb.weight.data.numpy()
    3. pca = PCA(n_components=2)
    4. components = pca.fit_transform(X)
    5. plt.figure(figsize=(6, 6))
    6. club1_x = []
    7. club1_y = []
    8. club2_x = []
    9. club2_y = []
    10. for node in G.nodes(data=True):
    11. if node[1]['club'] == 'Mr. Hi':
    12. club1_x.append(components[node[0]][0])
    13. club1_y.append(components[node[0]][1])
    14. else:
    15. club2_x.append(components[node[0]][0])
    16. club2_y.append(components[node[0]][1])
    17. plt.scatter(club1_x, club1_y, color="red", label="Mr. Hi")
    18. plt.scatter(club2_x, club2_y, color="blue", label="Officer")
    19. plt.legend()
    20. plt.show()
    21. # Visualize the initial random embeddding
    22. visualize_emb(emb)

    3.3 torch.nn.Embedding类的实例emb的训练过程(Question 7)

    PS:我们滴目标是,使每个节点在emb表中对应的embedding能最大程度的保持节点的结构特点。不同于监督分类任务中可以直接对比节点的预测值和标签值,这里没有ground-truth,所以采取的思路是:

    如果模型很好的学习到了节点的结构特征,那么一个节点和它的邻居的embedding应该是相似的(这里用向量点积来衡量相似度,相当于余弦相似度),所以模型的输出值就是一些向量点积,label标签就是对应edge是否在图中存在(即向量点积对应的两个节点是否直接相连)。

    代码如下:

    tips 1:Tensor类型的数据可以直接使用+、-、*、/、==、>、sigmoid、mul...等运算符,详见accuracy()函数以及train()函数的Version 2。

    tips 2:item()函数可以将一个1*1的Tensor转换为普通数据类型的量

    tips 3:每轮epoch,optimizer的梯度要清零

    tips4:随着epoch轮次的迭代,loss会越来越小,loss如果不是递减的,则模型有错误,例如梯度没清零。

    1. # Question 7
    2. from torch.optim import SGD
    3. import torch.nn as nn
    4. def accuracy(pred, label):
    5. # TODO: Implement the accuracy function. This function takes the
    6. # pred tensor (the resulting tensor after sigmoid) and the label
    7. # tensor (torch.LongTensor). Predicted value greater than 0.5 will
    8. # be classified as label 1. Else it will be classified as label 0.
    9. # The returned accuracy should be rounded to 4 decimal places.
    10. # For example, accuracy 0.82956 will be rounded to 0.8296.
    11. accu = 0.0
    12. ############# Your code here ############
    13. accu = ((pred > 0.5) == label).sum().item() * 1.0 / pred.shape[0]
    14. accu = round(accu, 4)
    15. #########################################
    16. return accu
    17. def train(emb, loss_fn, sigmoid, train_label, train_edge):
    18. # TODO: Train the embedding layer here. You can also change epochs and
    19. # learning rate. In general, you need to implement:
    20. # (1) Get the embeddings of the nodes in train_edge
    21. # (2) Dot product the embeddings between each node pair
    22. # (3) Feed the dot product result into sigmoid
    23. # (4) Feed the sigmoid output into the loss_fn
    24. # (5) Print both loss and accuracy of each epoch
    25. # (6) Update the embeddings using the loss and optimizer
    26. # (as a sanity check, the loss should decrease during training)
    27. epochs = 500
    28. learning_rate = 0.1
    29. optimizer = SGD(emb.parameters(), lr=learning_rate, momentum=0.9)
    30. for i in range(epochs):
    31. ############# Your code here Version 2 ############
    32. optimizer.zero_grad()
    33. tmp = emb(train_edge)
    34. pred = sigmoid(tmp[0].mul(tmp[1]).sum(1))
    35. loss = loss_fn(pred, train_label)
    36. print(f"Epochs: {i}, Accuracy: {accuracy(pred, train_label)}, Loss: {loss}")
    37. loss.backward()
    38. optimizer.step()
    39. #########################################
    40. ############# Your code here Version 1############
    41. '''
    42. optimizer.zero_grad()
    43. pred = torch.Tensor(train_edge.shape[1])
    44. for j in range(train_edge.shape[1]):
    45. node_ids = torch.LongTensor([train_edge[0][j], train_edge[1][j]])
    46. tmp = emb(node_ids)
    47. pred[j] = sigmoid(tmp[0].dot(tmp[1]))
    48. loss = loss_fn(pred, train_label)
    49. print(f"Epochs: {i}, Accuracy: {accuracy(pred, train_label)}, Loss: {loss}")
    50. loss.backward()
    51. optimizer.step()
    52. '''
    53. #########################################
    54. loss_fn = nn.BCELoss()
    55. sigmoid = nn.Sigmoid()
    56. print(pos_edge_index.shape)
    57. print(neg_edge_index.shape)
    58. # Generate the positive and negative labels
    59. pos_label = torch.ones(pos_edge_index.shape[1], )
    60. neg_label = torch.zeros(neg_edge_index.shape[1], )
    61. # Concat positive and negative labels into one tensor
    62. train_label = torch.cat([pos_label, neg_label], dim=0)
    63. # Concat positive and negative edges into one tensor
    64. # Since the network is very small, we do not split the edges into val/test sets
    65. train_edge = torch.cat([pos_edge_index, neg_edge_index], dim=1)
    66. print(train_edge.shape)
    67. train(emb, loss_fn, sigmoid, train_label, train_edge)
    68. # Visualize the final learned embedding
    69. visualize_emb(emb)

    部分结果:


    Average degree of karate club network is 5
    Average clustering coefficient of karate club network is 0.57
    The PageRank value for node 0 after one iteration is 0.13
    The node 5 has closeness centrality 0.01

    End...

  • 相关阅读:
    LintCode 1753: Doing Homework Algorithms Medium
    百度云版微信测试号专属浪漫消息推送(最新版)
    “游蛇”黑产团伙专题分析报告
    基于ANSYS Polyflow的逆向挤出模头设计攻略
    玩转高并发,17年开发经验架构师,历时三年编写Java高并发三部曲
    最简洁网站 SEO 优化,Lighthouse SEO 评分 92
    深度学习(CNN+RNN)笔记2
    Linux内核由哪些组成,这些你了解不
    RocketMQ
    org.activiti.engine
  • 原文地址:https://blog.csdn.net/qq_41661919/article/details/126202036