Setting up the JSON project file
================================

.. note:: You can find a ready-to-use JSON file named ::

            elongator.json

        in the tutorial directory
        and skip directly to the next step.

Create Xlink Analyzer project
-----------------------------

Here is the instruction how to create the JSON file from scratch.

First, you need to create `XlinkAnalyzer <https://www.embl-hamburg.de/XlinkAnalyzer/XlinkAnalyzer.html>`_ project file for your complex

.. note:: `XlinkAnalyzer <https://www.embl-hamburg.de/XlinkAnalyzer/XlinkAnalyzer.html>`_ is used here as a graphical interface for input preparation in Assembline.

    Does not matter if you do not have crosslinks - we use XlinkAnalyzer to prepare the input file for modeling.

#. Open Xlink Analyzer window:

    .. image:: images/xlinkanalyzer_window.png
      :width: 800

#. In the Xlink Analyzer project Setup tab, define subunits using the menu on the left. For each subunit, enter the name, the chain ID or comma-separated multiple IDs, define the color, and click Add button.

    Set up the chain IDs as you want them in the final models, they do not have to correspond to chain IDs in your input PDB files.

    The result should look like this:

    .. image:: images/xlinkanalyzer_elongator_subunits.png
      :width: 800

#. Click on the Domains button and define domains of Elp1 in the window that opens - they will be used later for adding restraints:

    .. image:: images/xlinkanalyzer_elongator_domains.png
      :width: 400
      :alt: Elongator Xlink Analyzer domains
   
    Close the Domains window.

#. Load sequence data using the panel on the right in the Setup tab. 
   
    For this, prepare a file with sequences of all proteins in a single file in `FASTA format <https://en.wikipedia.org/wiki/FASTA_format>`_

    Here, use the ``elp_sequences.fasta`` file provided in the tutorial materials.

    Upload the file using the Browse button, enter a name (e.g. "sequences") and select "sequence" type in the drop down menu.

    Click Add button.

    Map the sequence names to names of subunits by clicking the Map button and selecting the subunits in a window that opens.
    After the mapping, click on the "check" button that turns red.

    The result should look like this:

    .. image:: images/xlinkanalyzer_sequences.png
      :width: 800
   
#. Load crosslink data using the same panel on the left in the Setup tab.

    The crosslink files need to be provided in Xlink Analyzer or xQuest format.

    Here, the files in xQuest format have been prepared in the tutorial materials:

    .. code-block:: bash

        xlinks/
            DSS/
                inter_run3_190412.clean_strict.csv
                intra_run3_190412.clean_strict.csv
                loop_run3_190412.clean_strict.csv
                mono_run3_190412.clean_strict.csv
                sg1-inter.clean_strict.csv
                sg1-intra.clean_strict.csv
                sg1-loop.clean_strict.csv
                sg2-3-inter.clean_strict.csv
                sg2-3-intra.clean_strict.csv
                sg2-3-loop.clean_strict.csv
                sg2-3-mono.clean_strict.csv
            DSG/
                inter_dsg.clean_strict.csv
                inter_dsg_repeat.clean_strict.csv
                intra_dsg.clean_strict.csv
                intra_dsg_repeat.clean_strict.csv
                loop_dsg.clean_strict.csv
                loop_dsg_repeat.clean_strict.csv
                mono_dsg.clean_strict.csv
                mono_dsg_repeat.clean_strict.csv

    The files contain crosslinking results for two crosslinkers (DSS and DSG) in multiple files (multiple runs, inter/intra/loop/mono crosslinks in different files)

    Name the first dataset "DSS" click Browse button and select all CSV files from the ``DSS`` directory. Set type to ``xquest`` or ``XlinkAnalyzer``. Repeat for DSG.

    Map crosslinked protein names to the subunit names using the Map button

    After the operations above you should end up with sth like this:

    .. image:: images/xlinkanalyzer_elongator.png
      :width: 800
      :alt: Elongator Xlink Analyzer

#. Save the JSON file under a name like ::

    xla_project.json

#. And make a copy that you will modify for modeling ::
   
    cp xla_project.json elongator.json


Add modeling information to the project file
--------------------------------------------

#. Open ``elongator.json`` in a text editor
   
    .. note:: The project file is in so-called `JSON format <https://en.wikipedia.org/wiki/JSON>`_

        While it may look difficult to edit at the first time, it is actually quite OK with a proper editor (and a bit of practice ;-)
    
        We recommend to use a good editor such as:

            * `SublimeText <https://www.sublimetext.com/>`_
        
            * `Atom <https://atom.io/>`_

    At this point, the JSON has the following format:

    .. code-block:: JSON

        {
            "data": [
                {
                    "some xlink definition 1"
                },
                {
                    "some xlink definition 2"
                },
                {
                    "sequence file definition"
                }
            ],
            "subunits": [
                    "subunit definitions"
            ],
            "xlinkanalyzerVersion": "..."
        }


#. Add symmetry
   
    #. First, specify the series of symmetry related molecules. Here, each of the three subunits is in two symmetrical copies, so we add series as below:
       
        .. code-block:: json

            {
                "series": [
                    {
                        "name": "2fold",
                        "subunit": "Elp1",
                        "mode": "input",
                        "cell_count": 2,
                        "tr3d": "2fold",
                        "inipos": "input"
                    },
                    {
                        "name": "2fold",
                        "subunit": "Elp2",
                        "mode": "auto",
                        "cell_count": 2,
                        "tr3d": "2fold",
                        "inipos": "input"
                    },
                    {
                        "name": "2fold",
                        "subunit": "Elp3",
                        "mode": "auto",
                        "cell_count": 2,
                        "tr3d": "2fold",
                        "inipos": "input"
                    }
                ]
                "data": [
                    {
                        "some xlink definition 1"
                    },
                    {
                        "some xlink definition 2"
                    },
                    {
                        "sequence file definition"
                    }
                ],
                "subunits": [
                        "subunit definitions"
                ],
                "xlinkanalyzerVersion": "..."
            }

    #. Second, define the coordinates of the symmetry axis:

        .. code-block:: json

            {
                "symmetry": {
                    "sym_tr3ds": [

                        {
                            "name": "2fold",
                            "axis": [0, 0, -1],
                            "center": [246.39112398, 246.41114644, 248.600000],
                            "type": "C2"   
                              
                        }

                    ]
                },
                "series": [
                    "the series"
                ],
                "data": [
                    {
                        "some xlink definition 1"
                    },
                    {
                        "some xlink definition 2"
                    },
                    {
                        "sequence file definition"
                    }
                ],
                "subunits": [
                        "subunit definitions"
                ],
                "xlinkanalyzerVersion": "..."
            }

#. Add specification of input PDB files

    The input structures for the tutorial are in the ``in_pdbs/`` directory::

        Elp1.CTD.on5cqs.5cqr.model_ElNemo_mode7.pdb
        Elp1_NTD_1st_propeller.pdb
        Elp1_NTD_2nd_propeller.pdb
        Elp2.pdb

    Add them to the JSON like this:

    .. code-block:: json
    
        {
            "symmetry": {
                "symmetry axis definition"
            },
            "series": [
                "the series"
            ],
            "data": [
                {
                    "type": "pdb_files",
                    "name": "pdb_files",
                    "data": [
                                {
                                    "foreach_serie": true,
                                    "foreach_copy": true,
                                    "components": [
                                        { "name": "Elp1",
                                        "subunit": "Elp1",
                                        "domain": "propeller1",
                                        "filename": "in_pdbs/Elp1_NTD_1st_propeller.pdb"
                                        }
                                    ]
                                },
                                {
                                    "foreach_serie": true,
                                    "foreach_copy": true,
                                    "components": [
                                        { "name": "Elp1",
                                        "subunit": "Elp1",
                                        "domain": "propeller2",
                                        "filename": "in_pdbs/Elp1_NTD_2nd_propeller.pdb"
                                        }
                                    ]
                                },
                                {
                                    "components": [
                                        { "name": "Elp1",
                                        "subunit": "Elp1",
                                        "serie": "2fold",
                                        "copies": [0],
                                        "chain_id": "G",
                                        "domain": "CTD",
                                        "filename": "in_pdbs/Elp1.CTD.on5cqs.5cqr.model_ElNemo_mode7.pdb"},

                                        { "name": "Elp1",
                                        "subunit": "Elp1",
                                        "serie": "2fold",
                                        "copies": [1],
                                        "chain_id": "H",
                                        "domain": "CTD",
                                        "filename": "in_pdbs/Elp1.CTD.on5cqs.5cqr.model_ElNemo_mode7.pdb"}                                
                                    ]
                                },
                                {
                                    "foreach_serie": true,
                                    "foreach_copy": true,
                                    "components": [
                                        { "name": "Elp2",
                                        "subunit": "Elp2",
                                        "filename": "in_pdbs/Elp2.pdb"}
                                    ]
                                },
                                {
                                    "foreach_serie": true,
                                    "foreach_copy": true,
                                    "components": [
                                        { "name": "Elp3",
                                        "subunit": "Elp3",
                                        "filename": "in_pdbs/Elp3.mono.pdb"}
                                    ]
                                }

                        ]

                },

                {
                    "some xlink definition 1"
                },
                {
                    "some xlink definition 2"
                },
                {
                    "sequence file definition"
                }
            ],
            "subunits": [
                    "subunit definitions"
            ],
            "xlinkanalyzerVersion": "..."
        }

    The ``foreach_serie`` and ``foreach_copy`` indicate the given PDB file specification will be applied to each serie with this subunit and 
    for each copy within the series. 

    All PDB selections within the same ``components`` block will be grouped into a rigid body, unless a separate ``rigid_bodies`` block is specified and ``add_rbs_from_pdbs`` is set to ``False`` in :doc:`params_setup`

#. Add pointers to fit libraries

    .. code-block:: json
    
        {
            "symmetry": {
                "symmetry axis definition"
            },
            "series": [
                "the series"
            ],
            "data": [
                {
                    "type": "pdb_files",
                    "name": "pdb_files",
                    "data": [
                                {
                                    "foreach_serie": true,
                                    "foreach_copy": true,
                                    "components": [
                                        { "name": "Elp1",
                                        "subunit": "Elp1",
                                        "domain": "propeller1",
                                        "filename": "in_pdbs/Elp1_NTD_1st_propeller.pdb"
                                        }
                                    ],
                                    "positions": "fits/search100000_metric_cam_inside0.6/emd_4151_binned.mrc/Elp1_NTD_1st_propeller.pdb/solutions_pvalues.csv",
                                    "positions_type": "chimera",
                                    "max_positions": 10000
                                },
                                {
                                    "foreach_serie": true,
                                    "foreach_copy": true,
                                    "components": [
                                        { "name": "Elp1",
                                        "subunit": "Elp1",
                                        "domain": "propeller2",
                                        "filename": "in_pdbs/Elp1_NTD_2nd_propeller.pdb"
                                        }
                                    ],
                                    "positions": "fits/search100000_metric_cam_inside0.6/emd_4151_binned.mrc/Elp1_NTD_2nd_propeller.pdb/solutions_pvalues.csv",
                                    "positions_type": "chimera",
                                    "max_positions": 10000
                                },
                                {
                                    "components": [
                                        { "name": "Elp1",
                                        "subunit": "Elp1",
                                        "serie": "2fold",
                                        "copies": [0],
                                        "chain_id": "G",
                                        "domain": "CTD",
                                        "filename": "in_pdbs/Elp1.CTD.on5cqs.5cqr.model_ElNemo_mode7.pdb"},

                                        { "name": "Elp1",
                                        "subunit": "Elp1",
                                        "serie": "2fold",
                                        "copies": [1],
                                        "chain_id": "H",
                                        "domain": "CTD",
                                        "filename": "in_pdbs/Elp1.CTD.on5cqs.5cqr.model_ElNemo_mode7.pdb"}                                
                                    ],
                                    "positions": "fits/search100000_metric_cam_inside0.6/emd_4151_binned.mrc/Elp1.CTD.on5cqs.5cqr.model_ElNemo_mode7.pdb/solutions_pvalues.csv",
                                    "positions_type": "chimera",
                                    "max_positions": 1
                                },
                                {
                                    "foreach_serie": true,
                                    "foreach_copy": true,
                                    "components": [
                                        { "name": "Elp2",
                                        "subunit": "Elp2",
                                        "filename": "in_pdbs/Elp2.pdb"}
                                    ],
                                    "positions": "fits/search100000_metric_cam_inside0.6/emd_4151_binned.mrc/Elp2.pdb/solutions_pvalues.csv",
                                    "positions_type": "chimera",
                                    "max_positions": 10000
                                },
                                {
                                    "foreach_serie": true,
                                    "foreach_copy": true,
                                    "components": [
                                        { "name": "Elp3",
                                        "subunit": "Elp3",
                                        "filename": "in_pdbs/Elp3.mono.pdb"}
                                    ],
                                    "positions": "fits/search100000_metric_cam_inside0.6/emd_4151_binned.mrc/Elp3.mono.pdb/solutions_pvalues.csv",
                                    "positions_type": "chimera",
                                    "max_positions": 10000
                                }

                        ]

                },

                {
                    "some xlink definition 1"
                },
                {
                    "some xlink definition 2"
                },
                {
                    "sequence file definition"
                }
            ],
            "subunits": [
                    "subunit definitions"
            ],
            "xlinkanalyzerVersion": "..."
        }


And that's it!