OpenCL: internal implementation tests in MQL5 - page 40

 
joo:

1. computer games don't care about GPU temperature and load it to the max, and nothing happens - graphics cards work.

2. in this case, MQL-function returning CPU temperature is also needed. :), otherwise the processor might get burnt.

3. - It's not a problem, of course.

1. how do you know they don't care? It is not a fact. Some may not care. And there are plenty of people on the forums who burned cards on toys.

Swedish enthusiasts had to find out that the GeForce GTX 590 driver version 267.52 may damage the high-end GeForce GTX 590. The first dual-processor 3D card burned out while trying to overclock. Given that the GPU voltage was overclocked, the experimenters decided to see how the second one would behave - and it suffered the same fate. By getting help from NVIDIA, the relentless Swedes managed to figure out that the reason for the failure is a bug in the driver, which prevents the overload protection from kicking in.

The process of the card failure is captured in an instructive video:


To download files from our server, please register.

The test was repeated with a new driver version - 267.71. It confirmed that the bug is fixed and the protection mechanism works. Unfortunately, the cards came with the driver with the error. It should be noted that the protection mechanism is not only for those cases where the user is experimenting with frequencies and voltages. During normal use, an abnormal situation may also arise where the only hope for the user is the remedy laid down by the developers.
Enthusiasts

are advised to never install the bundled driver, but rather to download a newer version from the NVIDIA website. In addition, overclocking enthusiasts should ensure that there is good ventilation in the PC case.

2. I don't argue with that. But it's not that critical - when the processor overheats, the Windows crashes into a blue screen. It's unpleasant, but it doesn't compare. :)

3. it's definitely realizable - there are heck of a lot of programs monitoring the temperature and other parameters of the card. (I've already got four of them.) Somehow they get this information, don't they?

Zy. Also, there was a rumor that at high temperatures the cards are beginning to miscalculate. In toys it is not critical, but somehow undesirable.

 

Cards burned because of errors in the video card driver, which have built-in overload protection mechanisms, rather than because the software running on the GPU does not control the temperature.

It is not like anyone is immune from driver errors and it is possible that even if no applications are running on GPU, the card will still burn out due to a driver error.

You don't have to worry about it - load your hardware to the max, nothing bad can happen even if you try hard. Modern hardware is packed with protection systems against overloads, both CPU and GPU. Long gone are the days when by removing the cooler from the CPU you could literally start a fire.

 

joo 2012.03.21 09:06

The cards are burnt out because of errors in the video card driver that has built-in overload protection mechanisms and not because the software running on the GPU does not monitor the temperature.

It is not like anyone is immune from driver errors and it is possible that even if no applications are running on the GPU the card will still burn out due to a driver error.

You don't have to worry about it - load your hardware to the max, nothing bad can happen even if you try hard. Modern hardware is packed with protection systems against overloads, both CPU and GPU. Long gone are the days when removing a cooler from the CPU could literally start a fire.

I agree, temperature control is a low-level utility task (maybe even a hardware task),

controlling the temperature from software written in a high-level language is a suicidal way to go.

 

Have anyone try this GPU hazing.

Running in parallel loops, each of which 100000000 (one hundred million) iterations.

What are your impressions?

//+------------------------------------------------------------------+
//|                                                   OpenCLTest.mq5 |
//|                        Copyright 2011, MetaQuotes Software Corp. |
//|                                              http://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2011, MetaQuotes Software Corp."
#property link      "http://www.mql5.com"
#property version   "1.00"

//——————————————————————————————————————————————————————————————————————————————
const string cl_src=
"__kernel void MFractal(                                    \r\n"
"                       __global int *out                   \r\n"
"                      )                                    \r\n"
"  {                                                        \r\n"
"   int i = get_global_id(0);                               \r\n"
"   for(int u=0;u<100000000;u++)                            \r\n"
"   {                                                       \r\n"
"    out[i]+=u;                                             \r\n"
"    if(out[i]>10000)                                       \r\n"
"      out[i]=0;                                            \r\n"
"   }                                                       \r\n"
"   out[i]+= i;                                             \r\n"
"  }                                                        \r\n";
//——————————————————————————————————————————————————————————————————————————————


#define BUF_SIZE 480


//——————————————————————————————————————————————————————————————————————————————
void OnStart()
{
  int cl_ctx; // идентификатор контекста
  int cl_prg; // идентификатор программы
  int cl_krn; // идентификатор ядра
  int cl_mem; // идентификатор буфера


  //----------------------------------------------------------------------------
  //--- инициализируем OpenCL объекты
  if((cl_ctx=CLContextCreate(false))==0)
  {
    Print("OpenCL not found");
    return;
  }
  if((cl_prg=CLProgramCreate(cl_ctx,cl_src))==0)
  {
    CLContextFree(cl_ctx);
    Print("OpenCL program create failed");
    return;
  }
  if((cl_krn=CLKernelCreate(cl_prg,"MFractal"))==0)
  {
    CLProgramFree(cl_prg);
    CLContextFree(cl_ctx);
    Print("OpenCL kernel create failed");
    return;
  }
  if((cl_mem=CLBufferCreate(cl_ctx,BUF_SIZE*sizeof(float),CL_MEM_READ_WRITE))==0)
  {
    CLKernelFree(cl_krn);
    CLProgramFree(cl_prg);
    CLContextFree(cl_ctx);
    Print("OpenCL buffer create failed");
    return;
  }
  //----------------------------------------------------------------------------


  //--- подготовимся к выполению
  uint  offset[1]={0};
  uint  work  [1]={BUF_SIZE};


  //--- выставляем неизменяемые параметры функции OpenCL
  //CLSetKernelArg   (cl_krn,4,max);
  CLSetKernelArgMem(cl_krn,0,cl_mem);


  //--- подготовим буфер для вывода пикселей
  uint buf[];
  ArrayResize(buf,BUF_SIZE);


  uint x=GetTickCount();

  //--- выставляем плавающие параметры
  //CLSetKernelArg(cl_krn,0,x0);
  //CLSetKernelArg(cl_krn,1,y0);
  //CLSetKernelArg(cl_krn,2,x1);
  //CLSetKernelArg(cl_krn,3,y1);

  //--- считаем на GPU
  CLExecute(cl_krn,1,offset,work);

  //--- забираем данные из буфера
  CLBufferRead(cl_mem,buf);

  //--- выведем время расчётов
  Print(IntegerToString(GetTickCount()-x)+" msec");

  
/*
  //--- посмотрим ка что там насчитал нам GPU
  for(int i=0;i<BUF_SIZE;i++)
  {
    Print(buf[i]);
  }
*/

  //--- удаляем объекты OpenCL
  CLBufferFree (cl_mem);
  CLKernelFree (cl_krn);
  CLProgramFree(cl_prg);
  CLContextFree(cl_ctx);
}
//——————————————————————————————————————————————————————————————————————————————

2012.03.21 18:20:36 Tast Mand_ (EURUSD,H1) 5741 msec

2012.03.21 18:15:53 Terminal CPU: GenuineIntel Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz with OpenCL 1.1 (4 units, 3311 MHz, 8174 Mb, version 2.0)

2012.03.21 18:15:53 Terminal GPU: NVIDIA Corporation GeForce GTX 570 with OpenCL 1.1 (15 units, 1464 MHz, 1280 Mb, version 296.10)

Files:
 
MetaDriver: Also, there was a rumour that at higher temperatures the cards begin to mislead.
Is it because of the mass death of flies or something?
 
Mathemat:
Is it the death of the flies or something?
Try running the test (see my post above). Does the screen freeze, does the mouse cursor move?
 
Mathemat:
Is it because the flies are dying or something?

This is unlikely, as most of the process is reversible. Electron-hole plasma fluctuations... (oh how!). Individual bits click occasionally. You and I are out of our way. :)

But it does freak me out when the card gets to 90 Celsius. Couple thousand milliseconds between series of runs in this case keeps the temperature at least at 82C.

I've put it in, but the good thing is that you need the flexibility to put it in as needed and the temperature threshold can be set programmatically.

 
joo: Try running the test (see my post above). Does the screen freeze, does the mouse cursor move?

Well, I don't have such a powerful hardware as yours.

I'll try it now, but I'll put the AMD driver back in.

 
Mathemat:

I don't have such powerful hardware as you have.

I'll try it now, but I'll put the AMD driver back in.

By the way, I've got AMD OpenCL driver for my CPU now - I had to put it on, as the Intel one is installed without errors, but the processor is not detected as OpenCL device.
 
2012.03.21 15:45:49     Tast_Mand_ (EURUSD,H1)  16801 msec

2012.03.21 15:42:19     Terminal        CPU: AuthenticAMD AMD Athlon(tm) II X4 630 Processor with OpenCL 1.1 (4 units, 2998 MHz, 2048 Mb, version 2.0)
2012.03.21 15:42:19     Terminal        GPU: NVIDIA Corporation GeForce GT 440 with OpenCL 1.1 (2 units, 1660 MHz, 1024 Mb, version 295.73)
Didn't notice any strain on the video card, it runs and clicks. Opened the dispatcher during the test, also fine, no lags, only the test time became 17 seconds.
Reason: