gohan

module
v3.6.1+incompatible Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 13, 2022 License: LGPL-3.0

README

Gohan - A Genomic Variants API

bowl-of-rice

Prerequisites

TL;DR

Typical use-case walkthrough
  # environment
  cp ./etc/example.env .env # modify to your needs

  # kickstart dockerized gohan environment
  make init

  # (optional): if you plan on modifying the api codebase before deploying
  make init-dev

  # gateway & certificates
  mkdir -p gateway/certs/dev

  openssl req -newkey rsa:2048 -nodes -keyout gateway/certs/dev/gohan_privkey1.key -x509 -days 365 -out gateway/certs/dev/gohan_fullchain1.crt
  openssl req -newkey rsa:2048 -nodes -keyout gateway/certs/dev/es_gohan_privkey1.key -x509 -days 365 -out gateway/certs/dev/es_gohan_fullchain1.crt


  # build services
  make build-gateway 
  make build-drs
  make build-api 

  # run services
  make run-gateway
  make run-elasticsearch
  make run-drs
  make run-api
  
  
  # initiate genes catlogue:
  curl -k https://gohan.local/genes/ingestion/run
  
  # monitor progress:
  curl -k https://gohan.local/genes/ingestion/requests
  curl -k https://gohan.local/genes/ingestion/stats

  # view catalogue
  curl -k https://gohan.local/genes/overview


  # create table
  DATA='{
      "name": "Gohan Box Test Table",
      "data_type": "variant",
      "dataset": "00000000-0000-0000-0000-000000000000",
      "metadata": {}
  }'
  curl -k -0 -v -X POST https://gohan.local/tables \
    -H 'Content-Type:application/json' \
    --data "$(echo $DATA)" | jq

  # <obtain the table "id">


  # move vcf.gz files to `$GOHAN_API_VCF_PATH`

  # ingest vcf.gz
  curl -k https://gohan.local/variants/ingestion/run\?fileNames=<filename>\&assemblyId=GRCh37\&filterOutHomozygousReferences=true\&tableId=<table id>
  
  # monitor progress:
  curl -k https://gohan.local/variants/ingestion/requests
  curl -k https://gohan.local/variants/ingestion/stats

  # view variants
  curl -k https://gohan.local/variants/overview

Getting started

Environment :

First, from the project root, create a local file for environment variables with default settings by running

cp ./etc/example.env .env

and make any necessary changes, such as the Elasticsearch GOHAN_ES_USERNAME and GOHAN_ES_PASSWORD when in production.

note: a known current bug is that GOHAN_ES_USERNAME must remain its default..


Initialization

Run

make init

Elasticsearch & Kibana :

Run

make run-elasticsearch 

and (optionally)

make run-kibana

DRS :

Run

make build-drs
make run-drs

Data Access Authorization with OPA (more on this to come..) :

Run

make build-authz
make run-authz


Development

architecture
Gateway

To create and use development certs from the project root, run

mkdir -p gateway/certs/dev

openssl req -newkey rsa:2048 -nodes -keyout gateway/certs/dev/gohan_privkey1.key -x509 -days 365 -out gateway/certs/dev/gohan_fullchain1.crt
openssl req -newkey rsa:2048 -nodes -keyout gateway/certs/dev/es_gohan_privkey1.key -x509 -days 365 -out gateway/certs/dev/es_gohan_fullchain1.crt

Note: Ensure your CN matches the hostname (gohan.local by default)

These will be incorporated into the Gateway service (using NGINX by default, see gateway/Dockerfile and gateway/nginx.conf for details). Be sure to update your local /etc/hosts (on Linux) or C:/System32/drivers/etc/hosts (on Windows) file with the name of your choice.

Next, run

make build-gateway
make run-gateway

API

Containerized :

 To simply run a working instance of the api "out of the box", build the docker image and spawn the container with an fresh binary build by running

make build-api
make run-api

 and the docker-compose.yaml file will handle the configuration.


Local Development :

 This can be done multiple ways.

  1. Terminal : From the project root, run
# load variables from local file
set -a
. ./.env
set +a

cd src/api

go run .
  1. IDE (preferably VSCode)
- follow the recommended instructions listed at https://code.visualstudio.com/docs/languages/go

- configure the `.vscode/launch.json` to inject the above mentioned variables as recommended by https://stackoverflow.com/questions/29971572/how-do-i-add-environment-variables-to-launch-json-in-vscode

- click 'Run & Debug' > "Play" 

Local Release

 To build / test from source;

make build-api-local-binaries

 The binary can then be found at bin/api_${GOOS}_${GOARCH} and executed locally with

# load variables from local file
set -a
. ./.env
set +a

# navigate to binary directory
cd bin/

# execute binary
./api_${GOOS}_${GOARCH}

Endpoints :

/variants

Request

  GET /variants/overview
   params: none


Response

{
    "chromosomes": {
        "<CHROMOSOME>": `number`,
        ...
    },
    "sampleIDs": {
        "<SAMPLEID>": `number`,
        ...
    },
    "variantIDs": {
        "<VARIANTID>": `number`,
        ...
    }
}


Example :

{
    "chromosomes": {
        "21": 90548
    },
    "sampleIDs": {
        "hg00096": 33664,
        "hg00099": 31227,
        "hg00111": 25657
    },
    "variantIDs": {
        ".": 90548
    }
}



Requests

  GET /variants/get/by/variantId
   params:

  • chromosome : string ( 1-23, X, Y, MT )
  • lowerBound : number
  • upperBound : number
  • reference : string an allele ( "A" | "C" | "G" | "T" | "N" or some combination thereof )
  • alternative : string an allele
  • ids : string (a comma-deliminated list of variant ID alphanumeric codes)
  • size : number (maximum number of results per id)
  • sortByPosition : string (<empty> | asc | desc)
  • includeInfoInResultSet : boolean (true | false)
  • genotype : string ( "HETEROZYGOUS" | "HOMOZYGOUS_REFERENCE" | "HOMOZYGOUS_ALTERNATE" )
  • getSampleIdsOnly : bool (optional) - default: false

  GET /variants/count/by/variantId
   params:

  • chromosome : string ( 1-23, X, Y, MT )
  • lowerBound : number
  • upperBound : number
  • reference : string an allele
  • alternative : string an allele
  • ids : string (a comma-deliminated list of variant ID alphanumeric codes)
  • genotype : string ( "HETEROZYGOUS" | "HOMOZYGOUS_REFERENCE" | "HOMOZYGOUS_ALTERNATE" )

  GET /variants/get/by/sampleId
   params:

  • chromosome : string ( 1-23, X, Y, MT )
  • lowerBound : number
  • upperBound : number
  • reference : string an allele
  • alternative : string an allele
  • ids : string (comma-deliminated list of sample ID alphanumeric codes)
  • size : number (maximum number of results per id)
  • sortByPosition : string (<empty> | asc | desc)
  • includeInfoInResultSet : boolean (true | false)
  • genotype : string ( "HETEROZYGOUS" | "HOMOZYGOUS_REFERENCE" | "HOMOZYGOUS_ALTERNATE" )

  GET /variants/count/by/sampleId
   params:

  • chromosome : string ( 1-23, X, Y, MT )
  • lowerBound : number
  • upperBound : number
  • reference : string an allele
  • alternative : string an allele
  • ids : string (comma-deliminated list of sample ID alphanumeric codes)
  • genotype : string ( "HETEROZYGOUS" | "HOMOZYGOUS_REFERENCE" | "HOMOZYGOUS_ALTERNATE" )

Generalized Response Body Structure

{
    "status":  `number` (200 - 500),
    "message": `string` ("Success" | "Error"),
    "results": [
        {
            "query":  `string`,       // reflective of the type of id queried for, i.e 'variantId:abc123', or 'sampleId:HG0001
            "assemblyId": `string` ("GRCh38" | "GRCh37" | "NCBI36" | "Other"),    // reflective of the assembly id queried for
            "count":  `number`,   // this field is only present when performing a COUNT query
            "start":  `number`,   // reflective of the provided lowerBound parameter, 0 if none
            "end":  `number`,     // reflective of the provided upperBound parameter, 0 if none
            "chromosome":  `string`,       // reflective of the chromosome queried for
            "calls": [            // this field is only present when performing a GET query
                {
                   "id": `string`, // variantId
                   "chrom":  `string`,
                   "pos": `number`,
                   "ref": `[]string`,  // list of alleles
                   "alt": `[]string`,  // list of alleles
                   "info": [
                       {
                           "id": `string`,
                           "value": `string`,
                       },
                       ...
                   ],
                   "format":`string`,
                   "qual": `number`,
                   "filter": `string`,
                   "sampleId": `string`,
                   "genotype_type": `string ( "HETEROZYGOUS" | "HOMOZYGOUS_REFERENCE" | "HOMOZYGOUS_ALTERNATE" )`,
                   "assemblyId": `string` ("GRCh38" | "GRCh37" | "NCBI36" | "Other"),
                },
                ...
            ]
        },
    ]
}

Examples :





Request

  GET /variants/ingestion/run
   params:

  • filename : string (required)

Response

{
    "state":  `number` ("Queuing" | "Running" | "Done" | "Error"),
    "id": `string`,
    "filename": `string`,
    "message": `string`,
}


Request

  GET /variants/ingestion/requests
   params: none


Response

[
  {
    "state":  `number` ("Queuing" | "Running" | "Done" | "Error"),
    "id": `string`,
    "filename": `string`,
    "message": `string`,
    "createdAt": `timestamp string`,
    "updatedAt": `timestamp string`
  },
  ...
]


/tables


Request

  GET /tables


Response

[
  {
	  "id":             `string`,
 	  "name":           `string`,
	  "data_type":      `string`,
	  "dataset":        `string`,
	  "assembly_ids": `[]string`,
	  "metadata":        {...},
	  "schema":          {...},
  },
  ...
]


Request

  POST /tables

{
   "name":           `string`,
   "data_type":      `string`,
   "dataset":        `string`,
   "metadata":        {...},
}

Response

{
   "id":             `string`,
   "name":           `string`,
   "data_type":      `string`,
   "dataset":        `string`,
   "assembly_ids": `[]string`,
   "metadata":        {...},
   "schema":          {...},
}


Request

  GET /tables/:id
   path params:

  • id : string (UUID) (required)

Response

{
   "id":             `string`,
   "name":           `string`,
   "data_type":      `string`,
   "dataset":        `string`,
   "assembly_ids": `[]string`,
   "metadata":        {...},
   "schema":          {...},
}


Request

  GET /tables/:id/summary
   path params:

  • id : string (UUID) (required)

Response

{
   "count":               `int`,
   "data_type_specific":  {...},
}


Request

  DELETE /tables/:id
   path params:

  • id : string (UUID) (required)

Response

Status Code: 204



Deployments :

All in all, run

make run-elasticsearch 
make run-drs
make build-gateway && make run-gateway 
make build-api && make run-api

# and optionally
make run-kibana

For other handy tools, see the Makefile. Among those already mentionned here, you'll find other build, run, stop and clean-up commands.


Tests :

Once elasticsearch, drs, the api, and the gateway are up, run

make test-api-dev

Directories

Path Synopsis
src

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL