Conditioning Model
Model Setup Using RTMB
Implemented in: R/model.R (sbt_model), R/get-data.R (get_data), R/parameters.R (get_parameters, get_map), and R/functions.R (cmb).
This model is implemented with RTMB and evaluated by sbt_model() in R/model.R. Data and parameters are assembled in R, then passed to MakeADFun():
data <- get_data(data_in)
parameters <- get_parameters(data)
map <- get_map(parameters)
obj <- MakeADFun(func = cmb(sbt_model, data), parameters = parameters, map = map)Estimated parameters can be fixed or turned off by mapping entries in map to factor(NA).
Model Structure
Implemented in: R/get-data.R (get_data) and consumed by R/model.R (sbt_model).
The model is a single age-structured stock with ages 0 to 30 and two seasons per year. Six fisheries contribute catch dynamics, and a seventh selectivity block is used for CPUE.
| Fishery | Description | Season |
|---|---|---|
| LL1 | Japanese LL areas 4-9 and other longline catches | 2 |
| LL2 | Taiwanese albacore LL and gillnet | 2 |
| LL3 | Japanese LL in Area 2 | 1 |
| LL4 | Japanese spawning (Area 1) | 1 |
| Indonesian | Indonesian spawning | 1 |
| Australian Surface | Surface fishery | 1 |
| CPUE block | Selectivity-only block used in the CPUE likelihood | 2 |
Population dynamics
Implemented in: R/dynamics.R (do_dynamics, get_harvest_rate) and called from R/model.R (sbt_model).
Let N_{y,s,a} be numbers-at-age at year y, season s \in \{1,2\}, age a \in \{0,\dots,A\}, and let S_a = \exp(-0.5M_a).
Season 1 to season 2:
N_{y,2,a} = N_{y,1,a}(1-h_{y,1,a})S_a
Season 2 to next year:
N_{y+1,1,a+1} = N_{y,2,a}(1-h_{y,2,a})S_a, \quad a=0,\dots,A-1
Plus-group update:
N_{y+1,1,A} = N_{y,2,A-1}(1-h_{y,2,A-1})S_{A-1} + N_{y,2,A}(1-h_{y,2,A})S_A
Recruitment:
N_{y+1,1,0} = R_{y+1}
where h_{y,s,a} = \sum_{f=1}^{6} H_{y,s,f,a}.
For fisheries with removal_switch_f = 0, harvest is:
F_{y,s,f} = \frac{C_{y,s,f}}{\sum_a N_{y,s,a}s_{f,y,a}w_{f,y,a} + 10^{-6}}, \quad H_{y,s,f,a} = F_{y,s,f}s_{f,y,a}
For fisheries with removal_switch_f = 1, age-specific removals are:
H_{y,s,f,a} = \frac{C_{y,s,f}A_{y,s,f,a}}{N_{y,s,a}\left(\sum_a A_{y,s,f,a}w_{f,y,a} + 10^{-6}\right)}
with A_{y,s,f,a} from sliced age composition (af_sliced_ysfa).
Catch predictions:
\hat{C}^{\text{numbers}}_{f,y,a} = \sum_{s=1}^{2} H_{y,s,f,a}N_{y,s,a}
\hat{C}^{\text{weight}}_{y,s,f} = \sum_a H_{y,s,f,a}N_{y,s,a}w_{f,y,a}
A penalty term is accumulated from posfun(1 - \sum_f F_{y,s,f}, eps = 0.001) each season-year.
Spawning biomass is:
\text{SSB}_y = \sum_a N_{y,1,a} \phi_{y,a}
Stock-Recruitment
Implemented in: R/recruitment.R (get_recruitment, get_recruitment_prior, get_rho) and R/dynamics.R (get_initial_numbers).
Recruitment follows Beverton-Holt with a depensation term and lognormal deviation:
R_y = \frac{\alpha \,\text{SSB}_y}{\beta + \text{SSB}_y} \left(1 - \exp\left(\frac{\log(0.5)\,\text{SSB}_y}{\text{sr\_dep}\,B_0}\right)\right) \exp(\varepsilon_y - 0.5\sigma_r^2)
In R/recruitment.R, sr_dep defaults to 1e-10.
Initial equilibrium quantities from get_initial_numbers():
\text{rel}N_0=1,\quad \text{rel}N_a=\text{rel}N_{a-1}\exp(-M_{a-1}),\quad \text{rel}N_A=\frac{\text{rel}N_A}{1-\exp(-M_A)}
R_0 = \frac{B_0}{\sum_a \phi_{1,a}\text{rel}N_a},\quad \alpha=\frac{4hR_0}{5h-1},\quad \beta=\frac{B_0(1-h)}{5h-1},\quad N^{\text{init}}_a = R_0\text{rel}N_a
Recruitment-deviation prior in get_recruitment_prior():
\varepsilon_{1:(Y-3)} \sim N(0,\sigma_r^2), \quad \varepsilon_{(Y-2):Y} \sim \text{AR1}(\phi, \sigma_r)
where \phi is estimated by get_rho() as the empirical correlation of \epsilon_y and \epsilon_{y+1} over years 1965 to the terminal year - 5.
Selectivity
Implemented in: R/selectivity.R (get_selectivity, get_selectivity_prior) and called from R/model.R (sbt_model).
Selectivity is estimated for seven blocks (six fisheries plus CPUE). For fishery f, let a^{\min}_f and a^{\max}_f be the estimated age range, and let c_f be years where sel_change_year_fy[f,y] = 1.
At each change year y \in c_f, unnormalized selectivity is:
s'_{f,y,a} = \exp(\lambda_{f,r,a}), \quad a \in [a^{\min}_f, a^{\max}_f]
and is normalized by the mean over the estimated age range:
s_{f,y,a} = \frac{s'_{f,y,a}}{\text{mean}_{a^{\min}_f:a^{\max}_f}(s'_{f,y,a})}
For ages above a^{\max}_f, if sel_end_f[f] = 1 then s_{f,y,a}=s_{f,y,a^{\max}_f}, otherwise these ages remain zero. For years without a change point, selectivity is copied forward from the previous year.
The prior for each fishery’s log-selectivity matrix \Lambda_f is separable AR1 across change-year and age dimensions:
\log p(\Lambda_f) = \log \left[\text{dseparable}\left(\text{AR1}(\rho^y_f), \text{AR1}(\rho^a_f)\right)\right]
with scale
\sigma_f = \frac{\exp(\eta_f)}{\sqrt{1-(\rho^y_f)^2}\sqrt{1-(\rho^a_f)^2}}
where \eta_f = \text{par\_log\_sel\_sigma}[f].
A more complete explanation perhaps is detailed as follows:
Selectivity Prior (Separable AR1)
For fishery f, let the log-selectivity parameters be
\Lambda_f = \{ \lambda_{r,a} \},
where r indexes selectivity change-years and a indexes age.
Vectorizing the matrix,
\mathrm{vec}(\Lambda_f) \sim \mathrm{MVN}(0, \Sigma_f).
We assume a separable covariance structure:
\Sigma_f = \sigma_f^2 \left( \Sigma^{(y)}_f \otimes \Sigma^{(a)}_f \right),
where \otimes denotes the Kronecker product.
AR1 Structure Across Change-Years
For change-years,
\text{Cov}(\lambda_{r,a}, \lambda_{r',a}) = \sigma_f^2 \, (\rho^y_f)^{|r-r'|}.
AR1 Structure Across Ages
For ages,
\text{Cov}(\lambda_{r,a}, \lambda_{r,a'}) = \sigma_f^2 \, (\rho^a_f)^{|a-a'|}.
Combined Separable Covariance
Under separability,
\text{Cov}(\lambda_{r,a}, \lambda_{r',a'}) = \sigma_f^2 \, (\rho^y_f)^{|r-r'|} (\rho^a_f)^{|a-a'|}.
Scale Parameter
The marginal standard deviation is defined as
\sigma_f = \frac{\exp(\eta_f)} {\sqrt{1-(\rho^y_f)^2} \sqrt{1-(\rho^a_f)^2}},
where
\eta_f = par \log sel \sigma[f].
Growth
Implemented in: R/length-weight.R (get_length_at_age, get_dl) and giR/dynamics.R (get_phi).
Mean length-at-age is time varying by year and season (\text{length\_mu}_{y,s,a}), and SD-at-age is fixed (\text{length\_sd}_a). Age-length keys (alk_ysal) and distribution weights (dl_yal) are computed from these inputs and used in composition likelihoods.
Spawning output-at-age in year y (get_phi) is:
\phi_{y,a} = \frac{\sum_l d_{y,a,l}\,l^{3\psi}\,\text{mat}(l)}{\phi^*_{y,A}}
with maturity-at-length
\text{mat}(l)=\frac{1}{1 + 19^{(L_{50}-l)/(L_{95}-L_{50})}}
and normalization by the oldest age value \phi^*_{y,A}.
Natural Mortality
Implemented in: R/natural-mortality.R (get_M) and called from R/model.R (sbt_model).
Natural mortality-at-age is generated by get_M(min_age, max_age, age_increase_M, m0, m4, m10, m30).
Define:
p = \frac{\log\left(\frac{m_0 - m_4}{m_0 - m_{10}}\right)}{\log(1/3)}, \quad \Delta_M = \frac{m_{30} - m_{10}}{A - a_{\text{inc}}}
where A is the maximum age and a_{\text{inc}} is age_increase_M.
Then:
M_a = \begin{cases} m_0 & a \in \{0,1\} \\ m_0 - (m_0 - m_{10})\left(\frac{a-1}{9}\right)^p & 2 \le a \le 9 \\ m_{10} & 10 \le a \le a_{\text{inc}} \\ M_{a-1} + \Delta_M & a_{\text{inc}} < a < A \\ m_{30} & a = A \end{cases}
In the current RTMB implementation, all four mortality parameters are estimable unless mapped out in get_map().
Tagging Model
Implemented in: R/likelihoods.R (get_tag_like) and called from R/model.R (sbt_model).
Tag recapture data are modeled by cohort index k, tagger group t, release age i, and recapture age j (get_tag_like).
Adjusted releases (same-year recaptures removed and expanded for reporting):
N^{\text{adj}}_{k,t,i} = N_{k,t,i} - \frac{R_{k,t,i,i}}{\nu_{k+i-2,i}}
where \nu_{y,a} is the age- and year-specific reporting rate matrix (tag_rep_rates_ya).
To account for incomplete mixing in season 1, the tagged-fish harvest component is scaled by:
h^+_{k,j} = H_{\text{tag}}\,h_{y,1,j+1}, \quad y = \text{minK} + k - 1 + j
with H_{\text{tag}} = \exp(\text{par\_log\_tag\_H\_factor}).
Define survival and recapture terms:
S^{(1)}_{k,t,j} = (1-h^+_{k,j})(1-h_{y,2,j+1})\exp(-M_{j+1}-\Omega_t) S^{(2)}_{k,t,j} = (1-h^+_{k,j})(1-h_{y,2,j+1})\exp(-M_{j+1}-2\Omega_t) f^{(1)}_{k,t,j} = h^+_{k,j} + (1-h^+_{k,j})\exp\{-0.5(M_{j+1}+\Omega_t)\}h_{y,2,j+1} f^{(2)}_{k,t,j} = h^+_{k,j} + (1-h^+_{k,j})\exp\{-0.5(M_{j+1}+2\Omega_t)\}h_{y,2,j+1}
For the year immediately after release:
S^{\ast(1)}_{k,t,i} = \exp(-M_{i+1}-\Omega_t), \quad S^{\ast(2)}_{k,t,i} = \exp(-M_{i+1}-2\Omega_t)
Recapture probability (at least one tag returned):
p_{k,t,i,j} = \begin{cases} \left(2\xi_t S^{\ast(1)}_{k,t,i}f^{(1)}_{k,t,j} - \xi_t^2 S^{\ast(2)}_{k,t,i}f^{(2)}_{k,t,j}\right)\nu_{k+j-2,j}, & j=i+1 \\ \left(2\xi_t S^{\ast(1)}_{k,t,i}\prod_{s=i+1}^{j-1}S^{(1)}_{k,t,s}f^{(1)}_{k,t,j} - \xi_t^2 S^{\ast(2)}_{k,t,i}\prod_{s=i+1}^{j-1}S^{(2)}_{k,t,s}f^{(2)}_{k,t,j}\right)\nu_{k+j-2,j}, & j>i+1 \end{cases}
where \xi_t and \Omega_t correspond to immediate and continuous tag shedding (tag_shed_immediate, tag_shed_continuous).
Predicted recaptures are:
\hat{R}_{k,t,i,j} = N^{\text{adj}}_{k,t,i} p_{k,t,i,j}
Catch underreporting
Implemented in data preprocessing: R/get-data.R (get_data).
In the current RTMB workflow, catch adjustments are handled in get_data() using scenario multipliers (catch_LL1_case, catch_surf_case) plus additional historical undercatch components (catch_UA). There is no active standalone I42-style underreporting switch in R/model.R.
Predicted quantities
Implemented across R/dynamics.R and R/likelihoods.R; section-specific mappings are listed below.
Catch-at-age and catch-at-length
Implemented in: R/dynamics.R (do_dynamics) for catch predictions, and R/likelihoods.R (get_age_like, get_length_like) for observation models.
Predicted catch-at-age used by age and length likelihoods is accumulated across seasons:
\hat{C}^{\text{num}}_{f,y,a} = \sum_{s=1}^{2} H_{y,s,f,a}N_{y,s,a}
For length data, predicted proportions are generated using ALKs:
\hat{L}_{i,l} \propto \sum_a \hat{C}^{\text{num}}_{f_i,y_i,a}\,\text{ALK}_{y_i,s_i,a,l}
with lower bins optionally pooled via lf_minbin.
CPUE
Implemented in: R/likelihoods.R (get_cpue_like).
The CPUE likelihood uses fishery-7 selectivity and season-2 abundance (get_cpue_like). Define a creep adjustment:
a_1 = a_{\text{init}}, \quad a_i = a_{i-1} + \delta_{\text{cpue}} \; (i>1)
and effective abundance index:
\tilde{N}_i = \frac{\sum_{a=4}^{A} s_{7,y_i,a}N_{y_i,2,a}}{\text{mean}(s_{7,y_i,a_1:a_2})}
Unscaled log prediction:
\eta_i = \log(a_i) + \omega_{\text{cpue}}\log(\tilde{N}_i)
The series is centered and shifted by q:
\log \hat{I}_i = \eta_i - \log\left(\frac{1}{n}\sum_{i=1}^n e^{\eta_i}\right) + \log q_{\text{cpue}}
where \omega_{\text{cpue}} = \exp(\text{par\_log\_cpue\_omega}).
Tag returns
Implemented in: R/likelihoods.R (get_tag_like).
Predicted tag returns are:
\hat{R}_{k,t,i,j} = N^{\text{adj}}_{k,t,i} p_{k,t,i,j}
using the recapture probabilities defined in the Tagging section.
Aerial survey
Implemented in: R/likelihoods.R (get_aerial_survey_like).
Legacy data note: the aerial survey is retained because it is still implemented in the codebase, but it is treated as a legacy data type.
The unscaled aerial index for observation year y_i is:
\tilde{I}_i = \sum_{a=2}^{4} s^{\text{aerial}}_a\,w_{6,y_i,a}\,N_{y_i,1,a}
Selectivity vector s^{\text{aerial}} depends on aerial_switch:
-
0: (0.5,1,1), likelihood off -
1: (\exp(\theta_1),1,\exp(\theta_2)) -
2: (1,1,1) -
3: (0.33,1,0.33) -
4: (0.5,1,1)
A scaling parameter is estimated analytically each evaluation and applied to \tilde{I}_i.
Trolling survey
Implemented in: R/likelihoods.R (get_troll_like).
The trolling index is based on age-1 abundance in season 1:
\tilde{T}_i = N_{y_i,1,1}
A weighted least-squares scale factor is estimated:
q_{\text{troll}} = \frac{\sum_i \tilde{T}_i\,O_i/\sigma_i^2}{\sum_i \tilde{T}_i^2/\sigma_i^2}, \quad \sigma_i^2 = \text{SD}_{i}^2 + \tau_{\text{troll}}^2
and predictions are \hat{T}_i = q_{\text{troll}}\tilde{T}_i.
Objective Function
Implemented in: R/model.R (sbt_model), summing likelihood and prior components from helper functions.
The implemented objective in sbt_model() is:
\begin{aligned} \text{nll} = {}& lp_{\text{prior}} + \sum_f lp_{\text{sel},f} + lp_{\text{rec}} + lp_{\text{penalty}} \\ &+ \sum_i lp_{\text{af},i} + \sum_i lp_{\text{lf},i} + \sum_i lp_{\text{cpue\_lf},i} \\ &+ \sum_i lp_{\text{cpue},i} + \sum_i lp_{\text{aerial},i} + lp_{\text{aerial\_tau}} \\ &+ \sum_i lp_{\text{troll},i} + \sum_i lp_{\text{tag},i} + \sum_i lp_{\text{pop},i} \\ &+ \sum_i lp_{\text{hsp},i} + \sum_i lp_{\text{gt},i} \end{aligned}
CPUE Data
Implemented in: R/likelihoods.R (get_cpue_like).
Observed CPUE is modeled on log scale with year-specific SD:
\log O_i \sim N\left(\log \hat{I}_i,\; \sigma_i\right), \quad \sigma_i = \sqrt{\text{cpue}_\sigma_i^2 + \sigma_{\text{cpue}}^2}
where \sigma_{\text{cpue}} = \exp(\text{par\_log\_cpue\_sigma}).
Catch-at-age and Catch-at-length
Implemented in: R/likelihoods.R (get_age_like, get_length_like, get_cpue_length_like).
For fisheries with removal_switch_f = 0, composition likelihoods support four options:
-
switch = 1: multinomial (dmultinom) -
switch = 2: Dirichlet (ddirichlet) -
switch = 3: Dirichlet-multinomial (ddirmult) -
switch = 9: multinomial/KL approximation
Observed and predicted proportions are normalized within observation, with end bins pooled as defined in get_age_like() and get_length_like().
Tag returns likelihood
Implemented in: R/likelihoods.R (get_tag_like).
For each non-empty release set (k,t,i) with recapture ages j=i+1,\dots,J_k, the model uses a Dirichlet-multinomial form.
Define:
\omega_{k,t,i} = \max\left(\frac{N_{k,t,i} - \phi_{\text{tag}}}{\phi_{\text{tag}} - 1},\;10^{-3}\right)
with \phi_{\text{tag}} = \text{tag\_var\_factor}, and
R_{k,t,i,\neg} = N^{\text{adj}}_{k,t,i} - \sum_{j=i+1}^{J_k} R_{k,t,i,j}, \quad p_{k,t,i,\neg} = 1 - \sum_{j=i+1}^{J_k} p_{k,t,i,j}
Then (up to constants):
\begin{aligned} \log L_{k,t,i} = {}& \log\Gamma(\omega_{k,t,i}) - \log\Gamma(N^{\text{adj}}_{k,t,i}+\omega_{k,t,i}) \\ &+ \sum_{j=i+1}^{J_k}\left[\log\Gamma(R_{k,t,i,j}+\omega_{k,t,i}p_{k,t,i,j}) - \log\Gamma(\omega_{k,t,i}p_{k,t,i,j})\right] \\ &+ \log\Gamma(R_{k,t,i,\neg}+\omega_{k,t,i}p_{k,t,i,\neg}) - \log\Gamma(\omega_{k,t,i}p_{k,t,i,\neg}) \end{aligned}
and lp_{\text{tag}} = -\log L_{k,t,i}.
Aerial Survey Likelihood
Implemented in: R/likelihoods.R (get_aerial_survey_like).
Legacy data note: this likelihood remains documented for code fidelity, but the aerial stream is considered legacy in current use.
Let \Sigma = \text{aerial\_cov} + \tau_{\text{aerial}}^2 I, with \tau_{\text{aerial}} = \exp(\text{par\_log\_aerial\_tau}).
Using unscaled predictions \tilde{I}, define:
\log q_{\text{aerial}} = \frac{\mathbf{1}^T\Sigma^{-1}(\log O - \log \tilde{I})}{\mathbf{1}^T\Sigma^{-1}\mathbf{1}}
\log \hat{I} = \log \tilde{I} + \log q_{\text{aerial}}
and
\log O \sim \text{MVN}(\log \hat{I}, \Sigma)
In addition, the implementation reports and adds:
lp_{\text{aerial\_tau}} = 0.5\log|\Sigma|
Trolling Survey Likelihood
Implemented in: R/likelihoods.R (get_troll_like).
With \tau_{\text{troll}} = \exp(\text{par\_log\_troll\_tau}) and
\sigma_i = \sqrt{\sigma_i^2_\text{troll}} + \tau_{\text{troll}}^2}
the model fits
\log O_i \sim N(\log \hat{T}_i, \sigma_i), \quad \hat{T}_i = q_{\text{troll}}N_{y_i,1,1}
where q_{\text{troll}} is estimated analytically by weighted least squares each evaluation.
Close-kin Data
Implemented in data preparation and likelihood helpers: R/get-data.R (get_data), R/likelihoods.R (get_POP_like, get_HSP_like, get_GT_like).
The close-kin data are now an agreed dataset for use within the CCSBT Operating Model (OM). While the full dataset contains many details, for modeling purposes the simplified structure is as follows:
-
Juveniles: There are individuals i \in \Theta with a year of capture y_i, length-at-capture l_i, and either:
- A distribution of ages conditional on length, p(a_i \mid l_i) (via a growth relationship), or
- Direct age a_i (in the case of a known POP).
The cohort year of birth is calculated as c_i = y_i - a_i.
-
Adults: There are individuals j \in \Xi with a year of capture y_j, length-at-capture l_j, and either:
- A distribution p(a_j \mid l_j), or
- Direct age a_j (as in a POP match).
Defining the Parental Probability
Implemented in: R/likelihoods.R (get_POP_like).
The genetic data allow us to determine whether juvenile i and adult j form a Parent-Offspring Pair (POP) with high certainty. In a simple cartoon version, if there are N adults, the probability that any one is the parent of a juvenile is 2/N (either a mother or a father).
However, in a more realistic model with multiple cohorts and age structure, the following two factors must be considered:
- Relative reproductive output varies with age (e.g., maturity, batch fecundity, spawning history).
- Relative abundance decreases with age due to mortality and recruitment dynamics.
To compute the probability that adult j is the parent of juvenile i, we balance these effects. Let:
\quad c_i = y_i - a_i: birth year of juvenile i
\quad a_j - a_i: age of adult j in the birth year of juvenile i
\quad \varphi_{a_j - a_i}: reproductive output of the adult at that age
Then the parental probability \pi_{ij} is:
\pi_{ij} = \frac{2 \varphi_{a_j - a_i}}{ \sum_a N_{y_i - a, a} \varphi_a }
where N_{y,a} is the abundance of fish of age a in year y, and \sum_a N_{c_i,a} \varphi_a is the total reproductive output in the birth year of juvenile i.
Close-kin Likelihood
Implemented in: R/likelihoods.R (get_POP_like).
At the individual level, the likelihood of observing the parent-offspring pairs (POPs), denoted \Psi, is a product of independent Bernoulli distributions:
L(\Psi \mid \circ) = \prod_{i \in \Theta} \prod_{j \in \Xi} \pi_{ij}^{k_{ij}} (1 - \pi_{ij})^{1 - k_{ij}}
where k_{ij} = 1 if the \{i, j\} pair is a POP and 0 otherwise.
Given that the full comparison set includes millions of \{i, j\} combinations, this base-level likelihood is computationally infeasible. To simplify:
- Group comparisons by:
- Adult capture age a
- Adult capture year y
- Juvenile cohort year c
Let:
\quad \Omega_{c,y,a} be the number of comparisons in group (c, y, a)
\quad \Psi_{c,y,a} be the number of POPs observed in that group
Then the grouped likelihood becomes a binomial likelihood:
L(\Psi_{c,y,a} \mid \cdot) \propto \prod_{c} \prod_{y} \prod_{a} \pi_{c,y,a}^{\Psi_{c,y,a}} (1 - \pi_{c,y,a})^{\Omega_{c,y,a} - \Psi_{c,y,a}}
Standardised Residuals
Implemented status: conceptual diagnostic description; no dedicated close-kin residual helper in current RTMB source.
As in catch-at-age and tagging models, residuals help evaluate model fit.
Under the binomial assumption, the standardised residual for close-kin group (c, y, a) is:
\hat{\varepsilon}_{c,y,a} = \frac{ \Psi_{c,y,a} - \Omega_{c,y,a} \pi_{c,y,a} }{ \sqrt{ \Omega_{c,y,a} \pi_{c,y,a} (1 - \pi_{c,y,a}) } }
If the binomial assumption holds, \hat{\varepsilon}_{c,y,a} should follow a standard normal distribution.
Aggregated Residuals: Higher-Level Approximation
Implemented status: conceptual diagnostic description; no dedicated aggregated-residual helper in current RTMB source.
A key challenge with close-kin data is deciding the appropriate level to evaluate residuals. At the base level, individual parental probabilities \pi_{ij} are often extremely low, with many comparisons and very few POP detections (often only one or two). This structure is uncommon in ecological models, making standard residual analysis (e.g., expecting variance ≈ 1) potentially misleading.
To address this, one more nuanced approach is to aggregate residuals to a higher level — such as cohort c and adult capture year y — by summing across adult ages.
Using the normal approximation to the binomial, the expected number of POPs at this aggregated level is:
\hat{\mu}_{c,y} = \sum_a \Omega_{c,y,a} \pi_{c,y,a}
with corresponding variance:
\hat{\sigma}^2_{c,y} = \sum_a \Omega_{c,y,a} \pi_{c,y,a} (1 - \pi_{c,y,a})
This allows a higher-level standardized residual:
\hat{\varepsilon}_{c,y} = \frac{ \Psi_{c,y} - \hat{\mu}_{c,y} }{ \sqrt{ \hat{\sigma}^2_{c,y} } }
While this normal approximation is useful for visual diagnostics, it is not ideal with very small probabilities or uneven sample sizes. In such cases, it is advisable to compute the exact probability from the sum of binomial variables, especially if assessing how well the OM explains the close-kin data.
Half-Sibling Pairs (HSP)
Implemented in: R/likelihoods.R (get_HSP_like).
For each HSP data record, the model computes a probability from age-specific reproductive output and survival between cohorts (get_HSP_like):
\gamma_{y,a} = \frac{N_{y,2,a}\phi_{y,a}}{\text{SSB}_y}
and
p^{\text{HSP}}_i = \frac{4q_{\text{hsp}}}{\text{SSB}_{c_{\max}}} \sum_a \gamma_{c_{\min},a}\,\text{Surv}(a,c_{\max}-c_{\min})\,\phi_{c_{\max},a'}
with binomial likelihood
K_i \sim \text{Binomial}(N_i, p^{\text{HSP}}_i \times \text{hsp\_false\_negative})
where q_{\text{hsp}} = \exp(\text{par\_log\_hsp\_q}).
Gene Tagging (GT)
Implemented in: R/likelihoods.R (get_GT_like).
For each GT observation (get_GT_like):
p^{\text{GT}}_i = \frac{N^{\text{rel}}_i}{q_{\text{gt}}N_{y^{\text{rel}}_i,1,a^{\text{rel}}_i}}, \quad q_{\text{gt}} = \exp(\text{par\_log\_gt\_q})
Likelihood is binomial (gt_switch = 1) or beta-binomial (gt_switch = 2).
Prior distributions
Implemented in: R/priors.R (get_priors, evaluate_priors), R/recruitment.R (get_recruitment_prior), R/selectivity.R (get_selectivity_prior), and parameter mapping in R/parameters.R (get_map).
The prior term in the objective is:
lp_{\text{prior}} = -\sum_{m} \log p(\theta_m)
where active priors are defined in get_priors() (and may be changed in code). Current defaults are:
- \text{par\_log\_psi} \sim N(\log(1.75), 0.122^2)
- \text{par\_log\_m10} \sim N(\log(0.1), 0.06^2)
- \text{par\_log\_h} \sim N(\log(1.0), 1.5^2)
- \text{par\_log\_cpue\_omega} \sim N(0.875, 0.1^2)
In addition to lp_prior, two structured priors are always included:
- Recruitment-deviation prior
lp_recfromget_recruitment_prior() - Selectivity prior \sum_f lp_{\text{sel},f} from
get_selectivity_prior()
Parameters can be fixed by mapping in get_map(), which removes them from estimation (and effectively from prior contribution).
MSY Calculation
Implemented in: R/msy.R (msy_calc, find_msy, read_msy_dat, write_msy_report, print_msy_report).
The current implementation solves an equilibrium two-season, multi-fishery MSY problem using fully-selected fishery mortalities F^{\text{full}}_f.
MSY State Equations
Implemented in: R/msy.R (msy_calc).
Fishery-at-age mortality is:
F_{f,a} = F^{\text{full}}_f s_{f,a}
Seasonal exploitation rates are accumulated as:
ER_{2,a} = \sum_{f=1}^{2} F_{f,a}, \qquad ER_{1,a} = \sum_{f=3}^{n_f} F_{f,a}
To avoid invalid exploitation rates near 1, each seasonal maximum is smoothed with a posfun-style adjustment. Let x_s = 1 - \max_a(ER_{s,a}) and \epsilon = 0.05. Then:
\text{sr\_adj}_s = \begin{cases} x_s, & x_s \ge \epsilon \\ \dfrac{\epsilon}{2 - x_s/\epsilon}, & x_s < \epsilon \end{cases}
\text{adj}_s = \frac{1-\text{sr\_adj}_s}{\max_a(ER_{s,a})}
and ER_{s,a} and F_{f,a} are rescaled by adj. The penalty contribution is:
\text{pen} = 10^6 \sum_s 0.01\,(x_s-\epsilon)^2 \, \mathbf{1}(x_s < \epsilon)
Relative equilibrium survivorship-per-recruit is:
Rel_0 = 1
Rel_a = Rel_{a-1}(1-ER_{1,a-1})(1-ER_{2,a-1})e^{-M_{a-1}}, \quad a=1,\dots,A
Plus-group correction:
Rel_A \leftarrow \frac{Rel_A}{1-(1-ER_{1,A})(1-ER_{2,A})e^{-M_A}}
Spawning biomass per recruit and equilibrium recruitment:
SBR = \sum_{a=0}^{A} Rel_a \phi_a
Rec = \max\left(1,\; \alpha - \frac{\beta}{SBR}\right)
Biomass metrics:
SSB^{\text{msy}} = Spbio = Rec \cdot SBR
Tbio = Rec \sum_{a=2}^{A} Rel_a \, spwt_a
Season-2-start abundance proxy for fisheries 1–4:
Rel2_a = Rel_a(1-ER_{1,a})e^{-0.5M_a}
Catch by fishery:
C_f = \begin{cases} \sum_a Rec \cdot Rel_a \cdot F_{f,a}\cdot cwt_{f,a}, & f \in \{5,6\} \\ \sum_a Rec \cdot Rel2_a \cdot F_{f,a}\cdot cwt_{f,a}, & f \in \{1,2,3,4\} \end{cases}
C^{\text{msy}} = Ctot = \sum_{f=1}^{n_f} C_f
Estimated fishery catch split:
\widehat{p}_f = \begin{cases} C_f/C^{\text{msy}}, & C^{\text{msy}} > 0\\ 0, & C^{\text{msy}} = 0 \end{cases}
Annual total mortality-at-age reported by the routine:
F^{\text{tot}}_a = -\log\left[(1-ER_{1,a})(1-ER_{2,a})\right]
Unfished spawning biomass per recruit:
Rel^0_0 = 1,\qquad Rel^0_a = Rel^0_{a-1}e^{-M_{a-1}}
Rel^0_A \leftarrow \frac{Rel^0_A}{1-e^{-M_A}},\qquad SBR_0 = \sum_{a=0}^{A} Rel^0_a \phi_a
MSY Optimization Objective
Implemented in: R/msy.R (find_msy).
find_msy() estimates F^{\text{full}} using bounded nlminb:
\min_{F^{\text{full}}_f \in [10^{-5},\,0.5]} \left[ -\log(C^{\text{MSY}}+1) + \lambda \sum_f (\widehat{p}_f - p_f^{\text{target}})^2 + \text{pen} \right]
with start values F^{\text{full}}=F_{\text{ini}} and default \lambda = 10^5.
Projection Model
Implemented in: R/projections.R (project_dynamics, project_selectivity, project_rec_devs) with core dynamics from R/dynamics.R (do_dynamics).
Model Structure
Implemented in: R/projections.R (project_dynamics) through projected catch/selectivity inputs to the dynamics engine.
Same as in the conditioning model, except that only four fisheries are included:
- LL1 fishery (second season)
- LL2 fishery (second season)
- Indonesian spawning fishery (first season)
- Australian surface fishery (first season)
Population Model
Implemented in: R/projections.R (project_dynamics) via calls to R/dynamics.R (do_dynamics).
Identical to the conditioning model.
Initial Abundances
Implemented in: R/projections.R (project_dynamics) using terminal reconstruction state and projected recruitment deviates from project_rec_devs.
Initial age-specific abundances at the start of the projection are estimated from the conditioning model. To represent process error in recruitment, lognormal autocorrelated error is added to initial abundances for ages 0–2.
Let Y_1 be the first projection year. Then:
N_{Y_1,4} = \hat{N}_{Y_1,4} \cdot \exp(0.4z - 0.08)
N_{Y_1,3} = \hat{N}_{Y_1,3} \cdot \exp(0.4z - 0.08)
N_{Y_1,2} = \hat{N}_{Y_1,2} \cdot \exp(\epsilon_{Y_1})
N_{Y_1,1} = \hat{N}_{Y_1,1} \cdot \exp(\hat{\rho} \epsilon_{Y_1-2} + \epsilon_{Y_1-1})
N_{Y_1,0} = \hat{N}_{Y_1,0} \cdot \exp(\hat{\rho}^2 \epsilon_{Y_1-2} + \hat{\rho} \epsilon_{Y_1-1} + \epsilon_{Y_1})
Where:
-
z \sim {N}(0,1)
-
\epsilon_y \sim {N}\left(0, (1 - \hat{\rho}^2) \sigma_R^2\right)
- \sigma_R = 0.6 (and an extra lognormal SD = 0.4 is added for \hat{N}_{Y_1-4,0} and \hat{N}_{Y_1-3,0})
These imply:
\tau_{Y_1} = \hat{\tau}_{Y_1} + \hat{\rho}^2 \epsilon_{Y_1-2} + \hat{\rho} \epsilon_{Y_1-1} + \epsilon_{Y_1}
and subsequent years follow:
\tau_{Y_1+1} = \hat{\rho} \tau_{Y_1} + \epsilon_{Y_1+1}, \quad \text{etc.}
This formulation assumes autocorrelated recruitment starting in Y_1-2.
However, it may over-propagate uncertainty from Y_1-3 due to grid-based point estimates.
A sensitivity run with uncorrelated \tau_{Y_1-2} is used to address this.
Selectivity in Projections
Implemented in: R/projections.R (project_selectivity).
Unlike the conditioning model, the projection model does not use random-walk selectivity due to the risk of unrealistic drift. Instead, it starts from the mean of the last 3 estimates and adds autocorrelated process error:
For fishery 1:
s_{1,a,y+1} = s_{1,a,y} \cdot \exp(\varepsilon_{a,y}) \quad \text{for } a_1^{\min,s} \le a \le a_1^{\max,s}, \text{ where } a_1^{\min,s} = 2, a_1^{\max,s} = 17
Error terms follow:
\varepsilon_{a,y} = \rho_{\text{sel}} \varepsilon_{a,y-1} + \sqrt{1 - \rho_{\text{sel}}^2} \, \eta_{a,y}, \quad \eta_{a,y} \sim {N}(0, 0.05^2), \quad \rho_{\text{sel}} = 0.7
Selectivities update annually (not in blocks, as in the conditioning model).
Australian Surface Fishery (Special Case)
Implemented status: legacy/ADMB-style narrative; no dedicated age-3 boost rule in current RTMB projection helpers (R/projections.R).
Let:
P_{y,3} = \frac{N_{y,3}}{\sum_{a=1}^5 N_{y,a}}, \quad \bar{P}_3 = \frac{1}{10} \sum_{y=y_{n2}-9}^{y_{n2}} P_{y,3}
If P_{y,3} \ge \bar{P}_3:
s_{6,y,a} = s_{6,y-1,a} \cdot \exp(\varepsilon_{6,y,a}) \quad \text{for } a = 1, \ldots, 5, \quad \varepsilon_{6,y,a} \sim {N}(0, 0.1^2)
Otherwise, boost selectivity at age 3:
s_{6,y,3} = s_{6,y-1,3} \cdot \left(1 + 0.5 \cdot \frac{\bar{P}_3 - P_{y,3}}{P_{y,3}} \right), \quad s_{6,y,a} = s_{6,y-1,a} \cdot \exp(\varepsilon_{6,y,a}) \text{ for } a = 1,2,4,5
Fishing Mortality in Projections
Implemented in: R/projections.R (project_dynamics) via R/dynamics.R (get_harvest_rate, do_dynamics).
Year subscripts are omitted for simplicity. Catch is defined as:
C = \sum_a \sum_f s_{f,a} F_f N_a
For each fishery f:
C = \sum_f C_f, \quad \text{where } C_f = \left( \sum_a s_{f,a} N_a \right) F_f
Solving for F_f:
F_f = \frac{C_f}{\sum_a s_{f,a} N_a}
Additionally:
- Catch-at-age: C_{f,a} = s_{f,a} F_f N_a
- Catch at age: C_a = \left( \sum_f s_{f,a} F_f \right) N_a
Note: A problem occurs if \sum_f s_{f,a} F_f > 1, which implies C_a > N_a.
Harvest Rate Bounding and Selectivity Adjustment
Implemented status: partial. Current RTMB uses posfun penalty in R/dynamics.R (get_harvest_rate); explicit A1-A6 rescaling routine is legacy narrative.
In early MP trials, the model enforced a hard upper bound on age-specific exploitation rates:
\sum_f s_{f,a} F_f \le 0.99
When this bound was exceeded, the model reduced the catch at that age, but not the selectivity or fishing mortality for other ages, which led to underutilization of TAC.
An improved method adjusts selectivities at the age(s) where the bound is exceeded (from MP Workshop II, Queenstown, April 2003).
Case of One Fleet (or Non-Overlapping Selectivities)
Implemented status: legacy/ADMB analytical description; no dedicated standalone helper in current RTMB source.
Assume a single fleet. If F \le 0.9, no adjustment is needed. If F > 0.9, apply:
C = \sum_a g(s_a F) N_a \tag{A1}
where the adjusted selectivity is:
s_a^* = \frac{g(s_a F)}{F} \tag{A2}
The proposed function g(x) is:
g(x) = \begin{cases} x & \text{if } x \le 0.9 \\ 0.9 + 0.08 \left[ 1 - \exp(-10(x - 0.9)) \right] & \text{if } x > 0.9 \end{cases} \tag{A3}
Notes: - g(x) < 1 ensures C_a = g(s_a F) N_a < N_a - g(x) is continuous and differentiable at x = 0.9 - Newton-Raphson is used to solve equation (A1) for F
Extension to Multiple Fleets
Implemented status: partially represented by multi-fleet harvest in R/dynamics.R (get_harvest_rate); the explicit A4-A6 derivation is legacy documentation.
If:
\sum_f s_{f,a} F_f \le 0.9 \quad \text{for all } a
then proceed as normal. Otherwise, for any age a:
C = \sum_a g\left(\sum_f s_{f,a} F_f\right) N_a \tag{A4}
To proportionally reduce selectivity for each fleet at age a, define:
s^*_{f,a} = s_{f,a} \cdot \left[ \frac{g\left( \sum_f s_{f,a} F_f \right)}{\sum_f s_{f,a} F_f} \right] \tag{A5}
Then:
C_{f,a} = s^*_{f,a} F_f N_a = s_{f,a} F_f N_a \cdot \left[ \frac{g\left( \sum_f s_{f,a} F_f \right)}{\sum_f s_{f,a} F_f} \right] \tag{A6}
This guarantees:
C_a = \sum_f C_{f,a} = g\left( \sum_f s_{f,a} F_f \right) N_a < N_a
Solving for F_f When Bounds Are Exceeded
Implemented status: current RTMB computes F_f algebraically by fishery in R/dynamics.R (get_harvest_rate) and applies posfun penalty rather than an explicit multivariate root-solver.
Once selectivity is adjusted via:
C_a = \sum_f C_{f,a} = \sum_f s^*_{f,a} F_f N_a = g\left(\sum_f s_{f,a} F_f \right) N_a
Then the coupled non-linear equations must be solved for each F_f:
C_f = \sum_a C_{f,a} = \sum_a s^*_{f,a} F_f N_a
Substituting s^*_{f,a} gives:
C_f = \sum_a s_{f,a} F_f \left[ \frac{g\left( \sum_{f'} s_{f',a} F_{f'} \right)}{\sum_{f'} s_{f',a} F_{f'}} \right] N_a
This system is solved via a multivariate root-finding method, e.g., extended Newton-Raphson.
CPUE and Aerial Survey Deviations
Implemented status: partial. CPUE projection deviations are simulated in R/projections.R (project_dynamics) using R/likelihoods.R (get_cpue_like); aerial-deviation text below is legacy narrative.
CPUE Deviations
Implemented in: R/projections.R (project_dynamics) plus CPUE predictor from R/likelihoods.R (get_cpue_like).
CPUE is simulated with autocorrelated trends in catchability, superimposed on a 0.5% annual increase in q.
- Empirical autocorrelation \rho_{\text{cpue}} from the 1969–2008 time series is used
- For standard deviation, choose \max(0.2, \text{empirical SD})
- Override option: pass a value via
-cpuestd xxin the projection code
Aerial Survey Deviations
Implemented status: legacy narrative; no dedicated aerial-deviation simulation helper in current R/projections.R.
Aerial indices are simulated by adding lognormal deviations. Initially, empirical variance and autocorrelation were used, but:
- When \tau_{\text{aerial}} was estimated as free, it produced high SDs (0.50–0.60)
- Residual autocorrelation was very low
To stabilize, \tau_{\text{aerial}} was fixed to 0.18, which with sampling error gives SD \approx 0.30 in log space.
Simulation Grid for Deviation Parameters
Implemented status: scenario documentation; no dedicated grid helper for this table in current RTMB code.
| Scenario | SD (CPUE) | \rho_{\text{cpue}} |
|---|---|---|
| Base | 0.20 | Empirical, correlated to last residual |
| highCpueCV | 0.30 | Empirical, correlated to last residual |
For the aerial survey, autocorrelation is set to zero due to short residual history and poor reliability.
Lags in Data Availability and Schedules of TAC Changes
Implemented status: policy/process documentation; not encoded as executable scheduling logic in current RTMB source.
When a TAC for year y is determined, the projection code assumes the following data are available:
- Catch data up to year y - 3
- TAC up to year y - 1
- CPUE up to year y - 3
- Age-composition data up to year y - 3
- Gene tagging index (age-2 abundance) in year y - 4
- Pop-HSP index of SSB for year y - 7
Options for TAC Update Frequency
Implemented status: policy documentation; not represented by a dedicated function in R/*.R.
OMMP8 recommended calculating the first TAC under the new MP in 2020 (for implementation in 2021), to allow more time for stakeholder consultation. This skips the usual lag for only the first decision.
The control file includes four TAC update schedule options:
-
Option (a): First TAC in 2021, then every year — no lag
-
Option (b): First TAC in 2021, then every other year — no lag
-
Option (c): First TAC in 2021, then every 3 years — no lag
- Option (d): First TAC in 2021, then every 3 years — with one-year lag
Example Schedule: Options (c) and (d)
Implemented status: illustrative schedule table; not generated by a dedicated function in current RTMB source.
In all cases, the first TAC is calculated in 2019, using:
- CPUE: up to 2018
- Gene tagging: 2016, 2017
- POP/HSP SSB: up to 2014
| Decision Year | Catch Data | CPUE Data | Gene Tagging | POP/HSP SSB | TAC Year | TAC Change? |
|---|---|---|---|---|---|---|
| 2017 | 2016 | 2016 | — | 2012 | 2019 | hardwired |
| 2018 | 2017 | 2017 | 2016 | 2013 | 2020 | hardwired |
| 2019 | 2018 | 2018 | 2017 | 2014 | 2021 | Yes |
| 2020 | 2019 | 2019 | 2018 | 2015 | 2020 | No |
| 2021 | 2020 | 2020 | 2019 | 2016 | 2021 | No |
| 2022 | 2021 | 2021 | 2020 | 2017 | 2022 | Yes |
| 2023 | 2022 | 2022 | 2021 | 2018 | 2023 | No |
Note: Shaded years (2016–2018) represent real data available in 2019 (not OM-simulated).
Tuning Levels
Implemented status: policy/tuning documentation; not encoded in a dedicated tuning module in current RTMB source.
The SFMWG proposed six tuning options based on the probability of rebuilding the stock to 20% of SSB_0 in 25 or 30 years. Short-term checkpoints were also set at 12 or 15 years from present, to check the probability of rebuilding to 10% of SSB_0 or to 2 \times SSB_{2009}.
Tuning Options and Checkpoint Years
Implemented status: documentation table; not generated by a dedicated function in current RTMB source.
| Tuning Level | Tuning Year | P[SSB ≥ 0.2 SSB_0] | Short-term Checkpoint Year |
|---|---|---|---|
| 1 | 2035 | 0.60 | 2022 |
| 2 | 2035 | 0.70 | 2022 |
| 3 | 2035 | 0.90 | 2022 |
| 4 | 2040 | 0.60 | 2025 |
| 5 | 2040 | 0.70 | 2025 |
| 6 | 2040 | 0.90 | 2025 |
Maximum–Minimum TAC Changes
Implemented status: policy constraint discussion; bounds can be explored by user-specified projection inputs, not by a dedicated hardcoded helper.
The SFMWG recommended examining two options for TAC change bounds:
- Maximum change: 3000 t or 5000 t
- Minimum change: 100 t
These are not hardcoded, so users can explore alternative values.
However, prior analyses indicated that these bounds act as binding constraints, especially at high tuning levels where the TAC was often pushed against the maximum limit.
Appendix: Transition from ADMB to RTMB
Implemented in: R/parameters.R (get_map), R/model.R (sbt_model), and workflow vignettes/scripts that construct MakeADFun objects.
This model was originally implemented in ADMB using control file flags. These have been replaced in RTMB with mapped parameters and script-based control. Key transitions include:
| ADMB Flag/Control Concept | RTMB Equivalent |
|---|---|
flag_rec_devs = 1 |
rec_devs not mapped (map = list(rec_devs = ...)) |
flag_est_q = 0 |
map = list(log_q = factor(NA)) |
| Estimation phase definitions | Controlled in R script via parameters and map
|
Profile runs via -mcmc
|
Handled via tmbstan or adnuts in R |
This makes the model setup more reproducible and scriptable, and decouples the estimation logic from text-based control files.
