hi之前的tf-NCF版本实在是难改了,主要是负采样采用的多进程/多线程,采样的方法也难看出来是啥子;另外就是loss可能设置错了,不知咋改,限于此,果断放弃了这个版本,找了个微软的版本试试。之前也有试过NeuRec的,但一般数据量(百万级)就跑不动了,直接死了,遂放弃。
For Recommendation in Deep learning QQ Group 102948747 For Visual in deep learning QQ Group 629530787 I’m here waiting for you
不接受这个网页的私聊/私信!!!
1-初始化的方法
class Constant: Initializer that generates tensors with constant values.class GlorotNormal: The Glorot normal initializer, also called Xavier normal initializer.class GlorotUniform: The Glorot uniform initializer, also called Xavier uniform initializer.class HeNormal: He normal initializer.class HeUniform: He uniform variance scaling initializer.class Identity: Initializer that generates the identity matrix.class Initializer: Initializer base class: all Keras initializers inherit from this class.class LecunNormal: Lecun normal initializer.class LecunUniform: Lecun uniform initializer.class Ones: Initializer that generates tensors initialized to 1.class Orthogonal: Initializer that generates an orthogonal matrix.class RandomNormal: Initializer that generates tensors with a normal distribution.class RandomUniform: Initializer that generates tensors with a uniform distribution.class TruncatedNormal: Initializer that generates a truncated normal distribution.class VarianceScaling: Initializer capable of adapting its scale to the shape of weights tensors.class Zeros: Initializer that generates tensors initialized to 0.class constant: Initializer that generates tensors with constant values.class glorot_normal: The Glorot normal initializer, also called Xavier normal initializer.class glorot_uniform: The Glorot uniform initializer, also called Xavier uniform initializer.class he_normal: He normal initializer.class he_uniform: He uniform variance scaling initializer.class identity: Initializer that generates the identity matrix.class lecun_normal: Lecun normal initializer.class lecun_uniform: Lecun uniform initializer.class ones: Initializer that generates tensors initialized to 1.class orthogonal: Initializer that generates an orthogonal matrix.class random_normal: Initializer that generates tensors with a normal distribution.class random_uniform: Initializer that generates tensors with a uniform distribution.class truncated_normal: Initializer that generates a truncated normal distribution.class variance_scaling: Initializer capable of adapting its scale to the shape of weights tensors.class zeros: Initializer that generates tensors initialized to 0.
可见Xavier已经改名为glorot了,在tf中是同一个意思,我说在tf2.0+怎么找不到了
2-ml-1m训练结果topk@50,训练集0.8,采用的是原代码的数据分割及测试评价方法
#NeuMFTook 609.2795 seconds for training.Took 38.0888 seconds for prediction.MAP:0.046699NDCG:0.168457Precision@K:0.093573Recall@K:0.206632#GMFTook 538.1528 seconds for training.Took 37.3592 seconds for prediction.MAP:0.041022NDCG:0.152598Precision@K:0.087086Recall@K:0.176921#MLPTook 571.0663 seconds for training.Took 37.4722 seconds for prediction.MAP:0.042092NDCG:0.156412Precision@K:0.087861Recall@K:0.187454
3-ml-20m的数据处理没做好,直接killed
这个主要还是负采样的事,开发者没做好这一步,也仅仅是复现了简单数据集的结果。经查看代码,发现开发者是取所有未点击的为负样本,这样对于百万item来说,负样本已经占据99.999%了,根本没法玩。所以内存直接爆炸了。我觉得可以取点击长度的5倍来截取,假如点击了100个,那么随机取500个负样本,这样也不至于内存爆炸吧。虽说内存不爆炸了,但是运行缓慢,根本没法玩*2。随机取3倍长度处理结果如下:
2021-04-20 19:06:21.490196: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5641d512d3e0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:2021-04-20 19:06:21.490217: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla V100-PCIE-16GB, Compute Capability 7.02021-04-20 19:06:21.490225: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (1): GeForce RTX 2080 Ti, Compute Capability 7.52021-04-20 19:06:21.490231: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (2): GeForce RTX 2080 Ti, Compute Capability 7.52021-04-20 19:21:54.006904: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0Took 5522.2788 seconds for training.Killed
只训练了4次,平均每次需要1400s,而且奇怪的是19:06到19:21不知道数据在干啥子,为啥这么慢,很没效率。最后test阶段内存又爆炸了,卧槽。随机挑选100个负样本,训练两次,batch_size降低为128算了。为啥拿20m的数据做实验,因为我的数据集比这个大。。。。。有没有100M的啊,有的话我就用100m。
随机挑选100个负样本还是死了,同上如下:
2021-04-21 09:27:33.984048: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla V100-PCIE-16GB, Compute Capability 7.02021-04-21 09:27:33.984057: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (1): GeForce RTX 2080 Ti, Compute Capability 7.52021-04-21 09:27:33.984064: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (2): GeForce RTX 2080 Ti, Compute Capability 7.52021-04-21 09:34:33.939384: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0Took 3646.7352 seconds for training.Killed
多少曾经与过往,终究都会随风而散。而留下的,只有苦涩的记忆。
愿我们终有重逢之时,而你还记得我们曾经讨论的话题。