56 Commits (2bb44454a548b931153b7bb304bc42fa9329a4ef)

Author SHA1 Message Date
AlexeyAB c0e2512af2 Activation improvement, more robust timer. 7 years ago
AlexeyAB 7dd97537fb XNOR-net tiny-yolo_xnor.cfg ~2x faster than cuDNN on CUDA (nVidia GPU Maxwell) 7 years ago
AlexeyAB 03e95320a1 XNOR coalesced memory access, and avoid bank conflicts 7 years ago
AlexeyAB ca43bbdaae Fixed openmp bugs for XNOR 7 years ago
AlexeyAB c0e01fd63c Test for XNOR-conv on CUDA 7 years ago
AlexeyAB b141f85cab Compile fix 7 years ago
AlexeyAB 007878393f Temporary Slow implementation of XNOR on CUDA (shared_memory) 7 years ago
AlexeyAB c4a9e3422e Temporary implementation of XNOR on CUDA 7 years ago
AlexeyAB 9753b72aeb temp fix, don't use it 7 years ago
AlexeyAB cfc5fedbb6 Just used spaces for indents instead of Tabs 7 years ago
AlexeyAB 9bae70b225 Accelerated by another 5% using FP16/32 Batch-norm for Tensor Cores. 7 years ago
AlexeyAB 537d135feb Improve training performance - batch-norm using cuDNN. 7 years ago
AlexeyAB 880cf187d8 Fixed multi-GPU training for Tensor Cores 7 years ago
AlexeyAB cad4d1618f Added support for Tensor Cores CC >= 7.0 (V100). For FP16/32 (mixed precision) define CUDNN_HALF should be used. 7 years ago
AlexeyAB cd2bdec090 Updated to CUDA 9.1. And fixed no_gpu dependecies. 7 years ago
AlexeyAB 6332ea99ab one more fix 7 years ago
AlexeyAB b2b5756d86 Added __float2half_rn() and __half2float() 7 years ago
AlexeyAB dda993f3dd Use half_float16 instead of float32 if defined both CUDNN and CUDNN_HALF. Use Tensor Cores. 7 years ago
AlexeyAB 9920410ba9 minor fix 8 years ago
AlexeyAB d7a30ada7e Fixed behavior if missing library cudnn.lib 8 years ago
AlexeyAB 3b9afd4cd2 Fixed behavior if missing library cudnn.lib 8 years ago
Joseph Redmon 75fe603722 :vegan: :charizard: 9 years ago
Joseph Redmon c7a700dc22 new font strategy 9 years ago
Joseph Redmon 352ae7e65b ADAM 9 years ago
Joseph Redmon 73f7aacf35 better multigpu 9 years ago
Joseph Redmon 5c067dc447 good chance I didn't break anything 9 years ago
Joseph Redmon 8f1b4e0962 updates and things 9 years ago
Joseph Redmon afb8b4f98b CVPR prep 9 years ago
Joseph Redmon 08c7cf9c88 no mean on input binarization 9 years ago
Joseph Redmon 8322a58cf6 hate warnings 9 years ago
Joseph Redmon 729ce43e6e stuff 9 years ago
Joseph Redmon ec3d050a76 hope i didn't break anything 9 years ago
Joseph Redmon 13209df7bb art, cudnn 9 years ago
Joseph Redmon c7b10ceadb so much need to commit 9 years ago
Joseph Redmon cff59ba135 go updates 9 years ago
Joseph Redmon d1965bdb96 Go 9 years ago
Joseph Redmon 16d06ec0db stuff 9 years ago
Joseph Redmon 913d355ec1 lots of stuff 9 years ago
Joseph Redmon 892923514f fixed darknet, stuff 10 years ago
Joseph Redmon c2738835f0 Faster batch normalization 10 years ago
Joseph Redmon 0f7f2899b6 Fix for cuda 7.5 10 years ago
Joseph Redmon 8fd18add6e CVPR Experiments 10 years ago
Joseph Redmon d00f0a1ccd Changes to make routing work better 10 years ago
Joseph Redmon 6553b3f0e3 no comment 10 years ago
Joseph Redmon d7d7da2653 Fixed im2col mistake >< face#palm 10 years ago
Joseph Redmon e92f7d301c smaller gridsize in bias 10 years ago
Joseph Redmon 7100de0b59 going to break stuff 10 years ago
Joseph Redmon 664c5dd2f2 Subdivisions for batches 10 years ago
Joseph Redmon 9d418102f4 using caffe's im2col, it's so much better\! 10 years ago
Joseph Redmon 4af116e996 gonna change im2col 10 years ago