Here's what you can do with OpenCL directly in MetaTrader 5 without any DLLs - page 18

 
Aleksei Grebenkin:

Renat Fatkhullin how do i contact you to discuss the possibility of writing in MQL5+OpenCL. i need to use the processing power of video cards. if i understand correctly, then using an example from practice: the written robot optimises only 11 parameters by 3 machines connected via local network, the period is only 1 year 6 hours. i tried to charge 5 years optimization with a full search of data, it showed me that 2 months have to wait.If I have understood correctly, OpenCL will solve the problem. The speed should increase by hundreds of times, since the calculations will not involve processors but video cards. It means that taking into account the entire trading system, there will be about 200-300 parameters in the settings.

OpenCL calculations are not involved in parallelization of the optimization process.

With OpenCL, you can calculate a specific part of the algorithm faster and in parallel. We have many articles and discussions on OpenCL.

 
Maxim Kuznetsov #:

What's stopping you from buying a bigger card?

some A100https://www.nvidia.com/ru-ru/data-center/a100/

It's been said and done.




X2-X3 acceleration from previous calculations on Gforce RTX2080 TI

But there's a separate point for anyone testing neural network models in the tester.

OpenCl does not allow for multi-threaded access if there are more than 10-12 processes (agents).

Especially if multiple neural networks are created simultaneously to analyze different data with a merge in one.

And despite the fact that the server now has 96 logical processors, we have to use 12.

Therefore it is more profitable to have several old computers from the network, probably cheaper by many times.


Separately, I would like to mention a separate opportunity to install AMD SDK which allowed to use CPUs with OpenCL.

Now, there are 96 devices ready to perform a task at the same speed, but not depending solely on card power.



However, the OpenCl library had to be corrected because the device selection process

CLContextCreate(CL_USE_ANY)

does not allow to understand which device is currently loaded.

And selecting GPU only or CPU only doesn't allow to use both options at the same time.


To solve this problem I made a test of each card for current computation speed,

using this interesting example to simulate computation (nice)

https://www.mql5.com/ru/code/825


In the library code it is embodied like this

int COpenCL::ID_FasterDevice()
  {
 
   int cl_prg;
   int cl_krn;
   int cl_mem;
   int cl_ctx;
   string device;
   ulong speed [];
   
   int dCount= (int)CLGetInfoInteger(0,CL_DEVICE_COUNT);
  
   
   if (dCount>1)
   {
   ArrayResize(speed,dCount);
   
      //----------------- измерим текщую производительность и выберем более быстрый девайс ----------
      for(int i = 0; i<dCount;i++)
         {
         cl_ctx=i;
         CLGetInfoString(cl_ctx,CL_DEVICE_NAME,device);
         Print(cl_ctx,": ",device);
         ulong start_time=GetMicrosecondCount();
     
//--- initializing OpenCL objects
   if((cl_ctx=CLContextCreate())==INVALID_HANDLE)
     {
      Print("OpenCL not found");
      return -1;
     }
   if((cl_prg=CLProgramCreate(cl_ctx,cl_src))==INVALID_HANDLE)
     {
      CLContextFree(cl_ctx);
      Print("OpenCL program create failed");
      return -1;
     }
   if((cl_krn=CLKernelCreate(cl_prg,"MFractal"))==INVALID_HANDLE)
     {
      CLProgramFree(cl_prg);
      CLContextFree(cl_ctx);
      Print("OpenCL kernel create failed");
      return -1;
     }
   if((cl_mem=CLBufferCreate(cl_ctx,SIZE_X*SIZE_Y*sizeof(uint),CL_MEM_READ_WRITE))==INVALID_HANDLE)
     {
      CLKernelFree(cl_krn);
      CLProgramFree(cl_prg);
      CLContextFree(cl_ctx);
      Print("OpenCL buffer create failed");
      return -1;
     }
//--- getting ready for execution
   float x0       =-2;
   float y0       =-0.5;
   float x1       =-1;
   float y1       = 0.5;
   uint  max      = 20000;
   uint  offset[2]={0,0};
   uint  work  [2]={SIZE_X,SIZE_Y};
   string objname ="OpenCL_"+IntegerToString(ChartID());
   string resname ="::Mandelbrot_"+IntegerToString(ChartID());
//--- setting unchangeable OpenCL function parameters
   CLSetKernelArg(cl_krn,4,max);
   CLSetKernelArgMem(cl_krn,5,cl_mem);
//--- creating the object for graphics display
   ChartRedraw();
   Comment("Benchmark OpenCl devices");
   ObjectCreate(0,objname,OBJ_BITMAP_LABEL,0,0,0);
   ObjectSetInteger(0,objname,OBJPROP_XDISTANCE,4);
   ObjectSetInteger(0,objname,OBJPROP_YDISTANCE,26);
//--- create initial empty picture
   uint buf[];

   ArrayResize(buf,SIZE_X*SIZE_Y);
   ResourceCreate(resname,buf,SIZE_X,SIZE_Y,0,0,SIZE_X,COLOR_FORMAT_XRGB_NOALPHA);
   ObjectSetString(0,objname,OBJPROP_BMPFILE,resname);
//--- rendering, till we are not stopped from the outside
   for (int samples=0;samples<100;samples++)
     {
      uint x=GetTickCount();
      //--- setting floating parameters
      CLSetKernelArg(cl_krn,0,x0);
      CLSetKernelArg(cl_krn,1,y0);
      CLSetKernelArg(cl_krn,2,x1);
      CLSetKernelArg(cl_krn,3,y1);
      //--- rendering the frame
      CLExecute(cl_krn,2,offset,work);
      //--- taking the frame data
      CLBufferRead(cl_mem,buf);
      //--- outputting the rendering time
      Comment(IntegerToString(GetTickCount()-x)+" msec per frame");
      //--- saving the frame in memory and drawing it
      ResourceCreate(resname,buf,SIZE_X,SIZE_Y,0,0,SIZE_X,COLOR_FORMAT_XRGB_NOALPHA);
      ChartRedraw();
      //--- a small pause and parameters update for the next frame
      Sleep(10);
      x0+=0.001 f;
      x1-=0.001 f;
      y0+=0.001 f;
      y1-=0.001 f;
     
     }
//--- removing OpenCL objects
   CLBufferFree(cl_mem);
   CLKernelFree(cl_krn);
   
   CLProgramFree(cl_prg);
   CLContextFree(cl_ctx);
         ulong finishtime=GetMicrosecondCount();
         ulong testtime= finishtime-start_time;  
         speed [i] = testtime; 
         
   ObjectDelete(0,objname);
   Comment("");
     }
      
      m_context= ArrayMinimum(speed,0,WHOLE_ARRAY);
   }
   else 
      m_context=-1;
//--- remove object
  
   return m_context;
  }
//+------------------------------------------------------------------+

in the EA code

 COpenCL         *TestOpenCL;
      TestOpenCL =new COpenCL;
      int faster_device=TestOpenCL.ID_FasterDevice();
      TestOpenCL.Initialize(cl_program,id_device,true);         

In the OpenCL library, take into account the possibility of selecting the device

//+------------------------------------------------------------------+
//| Initialize                                                       |
//+------------------------------------------------------------------+
bool COpenCL::Initialize(const string program,const int id_device=-1,const bool show_log=true)
  {  
   
     
     m_context=id_device;
    
     if((m_context=CLContextCreate())==INVALID_HANDLE)
     {
      Print("OpenCL not found");      
     }   
     else if ((m_context=CLContextCreate(CL_USE_ANY))==INVALID_HANDLE)
         {
       
               Print("OpenCL not found. Error code=",GetLastError());
                  return(false);     
         }
Files:
 

In tomorrow's release we are releasing in-house matrix/vector data types for use in machine learning.

The code of MQL5 programs will become much simpler and allow us to implement a large set of mathematical operations.

This is the first generation of functionality and then we will implement more complex mechanisms to implement the capabilities of such packages as TensorFlow. OpenCL will come in handy for this.

 
What is the point of all this multithreading for a neural network when a passage of a new epoch must rely on results of a previous passage. And all the parallel threads will only repeat the results of the first one. And in the end they will deposit the results into one file. Overwriting the results of the previous thread but in essence without changing the value...
 
Dmytryi Voitukhov #:
What is the point of all this multithreading for a neural network, when a passage of a new epoch should be based on the results of the previous passage. And all the parallel threads will only repeat the results of the first one. And in the end they will deposit the results into one file. Overwriting the results of the previous thread but in essence without changing the value...

Renat exclaims and you whine. How old are you?

 

Slightly corrected the library, made it more beautiful

       double testtime= (GetTickCount()-start_time)/1000;  
         speed [i] = NormalizeDouble(MathRound(1000/testtime/8)*8,3); //NormalizeDouble(1000/testtime,2); 
      
      CLGetInfoString(i,CL_DEVICE_NAME,device);
       Print("Device #", i,", speed =",speed [i]," FPS. ",device); 
   ObjectDelete(0,objname);
   Comment("");
     }
      
      m_context= ArrayMaximum(speed,0,WHOLE_ARRAY);
      CLGetInfoString(m_context,CL_DEVICE_NAME,device);
     
      Print("Faster device: #",m_context,", ",device," "); 


Files: