Published article "Neural networks made easy (Part 77): Cross-Covariance Transformer (XCiT)".

In our models, we often use various attention algorithms. And, probably, most often we use Transformers. Their main disadvantage is the resource requirement. In this article, we will consider a new algorithm that can help reduce computing costs without losing quality.













































