![]() |
(git:b77b4be)
|
DBT tensor framework for block-sparse tensor contraction. Representation of n-rank tensors as DBT tall-and-skinny matrices. Support for arbitrary redistribution between different representations. Support for arbitrary tensor contractions. More...
Functions/Subroutines | |
subroutine, public | dbt_copy (tensor_in, tensor_out, order, summation, bounds, move_data, unit_nr) |
Copy tensor data. Redistributes tensor data according to distributions of target and source tensor. Permutes tensor index according to order argument (if present). Source and target tensor formats are arbitrary as long as the following requirements are met: | |
subroutine, public | dbt_copy_matrix_to_tensor (matrix_in, tensor_out, summation) |
copy matrix to tensor. | |
subroutine, public | dbt_copy_tensor_to_matrix (tensor_in, matrix_out, summation) |
copy tensor to matrix | |
subroutine, public | dbt_contract (alpha, tensor_1, tensor_2, beta, tensor_3, contract_1, notcontract_1, contract_2, notcontract_2, map_1, map_2, bounds_1, bounds_2, bounds_3, optimize_dist, pgrid_opt_1, pgrid_opt_2, pgrid_opt_3, filter_eps, flop, move_data, retain_sparsity, unit_nr, log_verbose) |
Contract tensors by multiplying matrix representations. tensor_3(map_1, map_2) := alpha * tensor_1(notcontract_1, contract_1) | |
subroutine, public | dbt_batched_contract_init (tensor, batch_range_1, batch_range_2, batch_range_3, batch_range_4) |
Initialize batched contraction for this tensor. | |
subroutine, public | dbt_batched_contract_finalize (tensor, unit_nr) |
finalize batched contraction. This performs all communication that has been postponed in the contraction calls. | |
DBT tensor framework for block-sparse tensor contraction. Representation of n-rank tensors as DBT tall-and-skinny matrices. Support for arbitrary redistribution between different representations. Support for arbitrary tensor contractions.
subroutine, public dbt_methods::dbt_copy | ( | type(dbt_type), intent(inout), target | tensor_in, |
type(dbt_type), intent(inout), target | tensor_out, | ||
integer, dimension(ndims_tensor(tensor_in)), intent(in), optional | order, | ||
logical, intent(in), optional | summation, | ||
integer, dimension(2, ndims_tensor(tensor_in)), intent(in), optional | bounds, | ||
logical, intent(in), optional | move_data, | ||
integer, intent(in), optional | unit_nr | ||
) |
Copy tensor data. Redistributes tensor data according to distributions of target and source tensor. Permutes tensor index according to order
argument (if present). Source and target tensor formats are arbitrary as long as the following requirements are met:
order
argument is present, sizes must match after index permutation. ORtensor_in | Source |
tensor_out | Target |
order | Permutation of target tensor index. Exact same convention as order argument of RESHAPE intrinsic. |
bounds | crop tensor data: start and end index for each tensor dimension |
Definition at line 113 of file dbt_methods.F.
subroutine, public dbt_methods::dbt_copy_matrix_to_tensor | ( | type(dbcsr_type), intent(in), target | matrix_in, |
type(dbt_type), intent(inout) | tensor_out, | ||
logical, intent(in), optional | summation | ||
) |
copy matrix to tensor.
summation | tensor_out = tensor_out + matrix_in |
Definition at line 324 of file dbt_methods.F.
subroutine, public dbt_methods::dbt_copy_tensor_to_matrix | ( | type(dbt_type), intent(inout) | tensor_in, |
type(dbcsr_type), intent(inout) | matrix_out, | ||
logical, intent(in), optional | summation | ||
) |
copy tensor to matrix
summation | matrix_out = matrix_out + tensor_in |
Definition at line 384 of file dbt_methods.F.
subroutine, public dbt_methods::dbt_contract | ( | real(dp), intent(in) | alpha, |
type(dbt_type), intent(inout), target | tensor_1, | ||
type(dbt_type), intent(inout), target | tensor_2, | ||
real(dp), intent(in) | beta, | ||
type(dbt_type), intent(inout), target | tensor_3, | ||
integer, dimension(:), intent(in) | contract_1, | ||
integer, dimension(:), intent(in) | notcontract_1, | ||
integer, dimension(:), intent(in) | contract_2, | ||
integer, dimension(:), intent(in) | notcontract_2, | ||
integer, dimension(:), intent(in) | map_1, | ||
integer, dimension(:), intent(in) | map_2, | ||
integer, dimension(2, size(contract_1)), intent(in), optional | bounds_1, | ||
integer, dimension(2, size(notcontract_1)), intent(in), optional | bounds_2, | ||
integer, dimension(2, size(notcontract_2)), intent(in), optional | bounds_3, | ||
logical, intent(in), optional | optimize_dist, | ||
type(dbt_pgrid_type), intent(out), optional, pointer | pgrid_opt_1, | ||
type(dbt_pgrid_type), intent(out), optional, pointer | pgrid_opt_2, | ||
type(dbt_pgrid_type), intent(out), optional, pointer | pgrid_opt_3, | ||
real(kind=dp), intent(in), optional | filter_eps, | ||
integer(kind=int_8), intent(out), optional | flop, | ||
logical, intent(in), optional | move_data, | ||
logical, intent(in), optional | retain_sparsity, | ||
integer, intent(in), optional | unit_nr, | ||
logical, intent(in), optional | log_verbose | ||
) |
Contract tensors by multiplying matrix representations. tensor_3(map_1, map_2) := alpha * tensor_1(notcontract_1, contract_1)
note 2: for best performance the tensors should have been created in matrix layouts compatible with the contraction, e.g. tensor_1 should have been created with either map1_2d == contract_1 and map2_2d == notcontract_1 or map1_2d == notcontract_1 and map2_2d == contract_1 (the same with tensor_2 and contract_2 / notcontract_2 and with tensor_3 and map_1 / map_2). Furthermore the two largest tensors involved in the contraction should map both to either tall or short matrices: the largest matrix dimension should be "on the same side" and should have identical distribution (which is always the case if the distributions were obtained with dbt_default_distvec).
note 3: if the same tensor occurs in multiple contractions, a different tensor object should be created for each contraction and the data should be copied between the tensors by use of dbt_copy. If the same tensor object is used in multiple contractions, matrix layouts are not compatible for all contractions (see note 2).
note 4: automatic optimizations are enabled by using the feature of batched contraction, see dbt_batched_contract_init, dbt_batched_contract_finalize. The arguments bounds_1, bounds_2, bounds_3 give the index ranges of the batches.
tensor_1 | first tensor (in) |
tensor_2 | second tensor (in) |
contract_1 | indices of tensor_1 to contract |
contract_2 | indices of tensor_2 to contract (1:1 with contract_1) |
map_1 | which indices of tensor_3 map to non-contracted indices of tensor_1 (1:1 with notcontract_1) |
map_2 | which indices of tensor_3 map to non-contracted indices of tensor_2 (1:1 with notcontract_2) |
notcontract_1 | indices of tensor_1 not to contract |
notcontract_2 | indices of tensor_2 not to contract |
tensor_3 | contracted tensor (out) |
bounds_1 | bounds corresponding to contract_1 AKA contract_2: start and end index of an index range over which to contract. For use in batched contraction. |
bounds_2 | bounds corresponding to notcontract_1: start and end index of an index range. For use in batched contraction. |
bounds_3 | bounds corresponding to notcontract_2: start and end index of an index range. For use in batched contraction. |
optimize_dist | Whether distribution should be optimized internally. In the current implementation this guarantees optimal parameters only for dense matrices. |
pgrid_opt_1 | Optionally return optimal process grid for tensor_1. This can be used to choose optimal process grids for subsequent tensor contractions with tensors of similar shape and sparsity. Under some conditions, pgrid_opt_1 can not be returned, in this case the pointer is not associated. |
pgrid_opt_2 | Optionally return optimal process grid for tensor_2. |
pgrid_opt_3 | Optionally return optimal process grid for tensor_3. |
filter_eps | As in DBM mm |
flop | As in DBM mm |
move_data | memory optimization: transfer data such that tensor_1 and tensor_2 are empty on return |
retain_sparsity | enforce the sparsity pattern of the existing tensor_3; default is no |
unit_nr | output unit for logging set it to -1 on ranks that should not write (and any valid unit number on ranks that should write output) if 0 on ALL ranks, no output is written |
log_verbose | verbose logging (for testing only) |
Definition at line 491 of file dbt_methods.F.
subroutine, public dbt_methods::dbt_batched_contract_init | ( | type(dbt_type), intent(inout) | tensor, |
integer, dimension(:), intent(in), optional | batch_range_1, | ||
integer, dimension(:), intent(in), optional | batch_range_2, | ||
integer, dimension(:), intent(in), optional | batch_range_3, | ||
integer, dimension(:), intent(in), optional | batch_range_4 | ||
) |
Initialize batched contraction for this tensor.
Explanation: A batched contraction is a contraction performed in several consecutive steps by specification of bounds in dbt_contract. This can be used to reduce memory by a large factor. The routines dbt_batched_contract_init and dbt_batched_contract_finalize should be called to define the scope of a batched contraction as this enables important optimizations (adapting communication scheme to batches and adapting process grid to multiplication algorithm). The routines dbt_batched_contract_init and dbt_batched_contract_finalize must be called before the first and after the last contraction step on all 3 tensors. Requirements: - the tensors are in a compatible matrix layout (see documentation of `dbt_contract`, note 2 & 3). If they are not, process grid optimizations are disabled and a warning is issued. - within the scope of a batched contraction, it is not allowed to access or change tensor data except by calling the routines dbt_contract & dbt_copy. - the bounds affecting indices of the smallest tensor must not change in the course of a batched contraction (todo: get rid of this requirement). Side effects: - the parallel layout (process grid and distribution) of all tensors may change. In order to disable the process grid optimization including this side effect, call this routine only on the smallest of the 3 tensors.
examples/dbt_example.F
. (todo: the example is outdated and should be updated).Note 2: it is meaningful to use this feature if the contraction consists of one batch only but if multiple contractions involving the same 3 tensors are performed (batched_contract_init and batched_contract_finalize must then be called before/after each contraction call). The process grid is then optimized after the first contraction and future contraction may profit from this optimization.
batch_range_i | refers to the ith tensor dimension and contains all block indices starting a new range. The size should be the number of ranges plus one, the last element being the block index plus one of the last block in the last range. For internal load balancing optimizations, optionally specify the index ranges of batched contraction. |
Definition at line 2090 of file dbt_methods.F.
subroutine, public dbt_methods::dbt_batched_contract_finalize | ( | type(dbt_type), intent(inout) | tensor, |
integer, intent(in), optional | unit_nr | ||
) |
finalize batched contraction. This performs all communication that has been postponed in the contraction calls.
Definition at line 2174 of file dbt_methods.F.