clear-linux-documentation/source/clear-linux/tutorials/fmv.rst

.. _fmv:

Use the function multi-version patch generator
##############################################

CPU architectures often gain interesting new instructions as they evolve but
application developers find it difficult to take advantage of those
instructions. The reluctance to lose backward-compatibility is one of the
main roadblocks slowing developers from using advancements in newer computing
architectures. :abbr:`FMV (Function Multi-Versioning)`, which first appeared
in `GCC`_ 4.8, is a way to have multiple implementations of a function, each
using a different architecture specialized instruction-set extensions. GCC
6 introduces changes to FMV to make it even easier to bring architecture-
based optimizations to the application code.

In this tutorial we will use FMV on general code and on
:abbr:`FFT (Fast Fourier Transform)` library code. Upon completing the
tutorial, you will be able to use this technology on your code and use the
libraries to deploy architecture-based optimizations to your application code.

Install and configure a Clear Linux host on bare metal
******************************************************

First, follow our guide to :ref:`bare-metal-install`.

Once the bare metal installation and initial configuration are complete,
add the `desktop-dev` bundle to the system.

`desktop-dev`: contains the necessary development tools like GCC and Perl\*.

To install the bundles, run the following command in the :file:`$HOME`
directory:

.. code-block:: bash

   sudo swupd bundle-add desktop-dev

Detect loop vectorization candidates
************************************

Now, we need to detect the loop vectorization candidates to be cloned for
multiple platforms with FMV. As an example, we will use the following
simple C code:

.. code-block:: c
   :linenos:

    #include <stdio.h>
    #include <stdlib.h>
    #include <sys/time.h>
    #define MAX 1000000

    int a[256], b[256], c[256];

    void foo(){
        int i,x;
        for (x=0; x<MAX; x++){
            for (i=0; i<256; i++){
                a[i] = b[i] + c[i];
            }
        }
    }


    int main(){
        foo();
        return 0;
    }

Save the example code as :file:`example.c` in the current directory and build
with the following flags:

.. code-block:: bash

        gcc -O3  -fopt-info-vec  example.c -o example

The build generates the following output:

.. code-block:: console

    example.c:11:9: note: loop vectorized
    example.c:11:9: note: loop vectorized

The output shows that line 11 is a good candidate for vectorization:

.. code-block:: c

    for (i=0; i<256; i++){
        a[i] = b[i] + c[i];

Generate the FMV patch
**********************

To generate the FMV patch with the `make-fmv-patch`_ project, we
must clone the project and generate a log file with the loop vectorized
information:

.. code-block:: bash

        git clone https://github.com/clearlinux/make-fmv-patch.git
        gcc -O3  -fopt-info-vec  example.c -o example &> log

To generate the patch files, execute:

.. code-block:: bash

        perl ./make-fmv-patch/make-fmv-patch.pl log .

The :file:`make-fmv-patch.pl` script takes two arguments: `<buildlog>` and
`<sourcecode>`. Replace `<buildlog>` and `<sourcecode>` with the proper
values and execute:

.. code-block:: bash

        perl make-fmv-patch.pl <buildlog> <sourcecode>

The command generates the following :file:`example.c.patch` patch:

.. code-block:: console

    --- ./example.c 2017-09-27 16:05:42.279505430 +0000
    +++ ./example.c~    2017-09-27 16:19:11.691544026 +0000
    @@ -5,6 +5,7 @@

     int a[256], b[256], c[256];

    +__attribute__((target_clones("avx2","arch=atom","default")))
     void foo(){
         int i,x;
         for (x=0; x<MAX; x++){

We recommend you use the :file:`make-fmv-patch` script to add the attribute
generating the target clones on the function `foo`. Thus, we can have the
following code:

.. code-block:: c

    #include <stdio.h>
    #include <stdlib.h>
    #include <sys/time.h>
    #define MAX 1000000

    int a[256], b[256], c[256];

    __attribute__((target_clones("avx2","arch=atom","default")))
    void foo(){
        int i,x;
        for (x=0; x<MAX; x++){
            for (i=0; i<256; i++){
                a[i] = b[i] + c[i];
            }
        }
    }


    int main(){
        foo();
        return 0;
    }

Changing the value of the `$avx2` variable, we can change the target
clones when adding the patches or in the :file:`make-fmv-patch.pl` script:

.. code-block:: perl

    my $avx2 = '__attribute__((target_clones("avx2","arch=atom","default")))'."\n";

Compile the code again with FMV and add the option to analyze the `objdump`
log:

.. code-block:: bash

    gcc -O3 example.c -o example -g
    objdump -S example | less

You can see the multiple clones of the `foo` function:

.. code-block:: console

    foo
    foo.avx2.0
    foo.arch_atom.1

The cloned functions use AVX2 registers and vectorized instructions. To
verify, enter the following commands:

::

    vpaddd (%r8,%rax,1),%ymm0,%ymm0
    vmovdqu %ymm0,(%rcx,%rax,1)

FTT project example
*******************

To follow the same approach with a package like FFT, we must use the
`-fopt-info-vec` flag to get a build log file similar to:

.. code-block:: bash

    ~/make-fmv-patch/make-fmv-patch.pl results/build.log fftw-3.3.6-pl2/

    patching fftw-3.3.6-pl2/libbench2/verify-lib.c @ lines (36 114 151 162 173 195 215 284)
    patching fftw-3.3.6-pl2/tools/fftw-wisdom.c @ lines (150)
    patching fftw-3.3.6-pl2/libbench2/speed.c @ lines (26)
    patching fftw-3.3.6-pl2/tests/bench.c @ lines (27)
    patching fftw-3.3.6-pl2/libbench2/util.c @ lines (181)
    patching fftw-3.3.6-pl2/libbench2/problem.c @ lines (229)
    patching fftw-3.3.6-pl2/tests/fftw-bench.c @ lines (101 147 162 249)
    patching fftw-3.3.6-pl2/libbench2/mp.c @ lines (79 190 215)
    patching fftw-3.3.6-pl2/libbench2/caset.c @ lines (5)
    patching fftw-3.3.6-pl2/libbench2/verify-r2r.c @ lines (44 187 197 207 316 333 723)

For example, the :file:`fftw-3.3.6-pl2/tools/fftw-wisdom.c.patch` file
generates the following patches:

.. code-block:: diff
   :linenos:

       --- fftw-3.3.6-pl2/libbench2/verify-lib.c   2017-01-27 21:08:13.000000000 +0000
       +++ fftw-3.3.6-pl2/libbench2/verify-lib.c~  2017-09-27 17:49:21.913802006 +0000
       @@ -33,6 +33,7 @@

        double dmax(double x, double y) { return (x > y) ? x : y; }

       +__attribute__((target_clones("avx2","arch=atom","default")))
        static double aerror(C *a, C *b, int n)
        {
            if (n > 0) {
       @@ -111,6 +112,7 @@
       }

       /* make array hermitian */
       +__attribute__((target_clones("avx2","arch=atom","default")))
       void mkhermitian(C *A, int rank, const bench_iodim *dim, int stride)
       {
            if (rank == 0)
       @@ -148,6 +150,7 @@
       }

       /* C = A + B */
       +__attribute__((target_clones("avx2","arch=atom","default")))
       void aadd(C *c, C *a, C *b, int n)
       {
            int i;
       @@ -159,6 +162,7 @@
       }

       /* C = A - B */
       +__attribute__((target_clones("avx2","arch=atom","default")))
       void asub(C *c, C *a, C *b, int n)
       {
            int i;
       @@ -170,6 +174,7 @@
       }

       /* B = rotate left A (complex) */
       +__attribute__((target_clones("avx2","arch=atom","default")))
       void arol(C *b, C *a, int n, int nb, int na)
       {
            int i, ib, ia;
       @@ -192,6 +197,7 @@
            }
       }

With these patches, we can select where to apply the FMV technology making
bringing architecture-based optimizations to application code even easier.

**Congratulations!**

You have successfully installed an FMV development environment on Clear
Linux. Furthermore, you used cutting edge compiler technology to improve the
performance of your application based on Intel Architecture technology and
profiling of the specific execution of your application.

.. _GCC:  https://gcc.gnu.org
.. _make-fmv-patch: https://github.com/clearlinux/make-fmv-patch