Southern Bluefin Tuna Operating Model Documentation (RTMB) • sbt

Conditioning Model

Model Setup Using RTMB

Implemented in: R/model.R (sbt_model), R/get-data.R (get_data), R/parameters.R (get_parameters, get_map), and R/functions.R (cmb).

This model is implemented with RTMB and evaluated by sbt_model() in R/model.R. Data and parameters are assembled in R, then passed to MakeADFun():

data <- get_data(data_in)
parameters <- get_parameters(data)
map <- get_map(parameters)
obj <- MakeADFun(func = cmb(sbt_model, data), parameters = parameters, map = map)

Estimated parameters can be fixed or turned off by mapping entries in map to factor(NA).

Model Structure

Implemented in: R/get-data.R (get_data) and consumed by R/model.R (sbt_model).

The model is a single age-structured stock with ages 0 to 30 and two seasons per year. Six fisheries contribute catch dynamics, and a seventh selectivity block is used for CPUE.

Fishery	Description	Season
LL1	Japanese LL areas 4-9 and other longline catches	2
LL2	Taiwanese albacore LL and gillnet	2
LL3	Japanese LL in Area 2	1
LL4	Japanese spawning (Area 1)	1
Indonesian	Indonesian spawning	1
Australian Surface	Surface fishery	1
CPUE block	Selectivity-only block used in the CPUE likelihood	2

Population dynamics

Implemented in: R/dynamics.R (do_dynamics, get_harvest_rate) and called from R/model.R (sbt_model).

Let N_{y,s,a} be numbers-at-age at year y, season s \in \{1,2\}, age a \in \{0,\dots,A\}, and let S_a = \exp(-0.5M_a).

Season 1 to season 2:

N_{y,2,a} = N_{y,1,a}(1-h_{y,1,a})S_a

Season 2 to next year:

N_{y+1,1,a+1} = N_{y,2,a}(1-h_{y,2,a})S_a, \quad a=0,\dots,A-1

Plus-group update:

N_{y+1,1,A} = N_{y,2,A-1}(1-h_{y,2,A-1})S_{A-1} + N_{y,2,A}(1-h_{y,2,A})S_A

Recruitment:

N_{y+1,1,0} = R_{y+1}

where h_{y,s,a} = \sum_{f=1}^{6} H_{y,s,f,a}.

For fisheries with removal_switch_f = 0, harvest is:

F_{y,s,f} = \frac{C_{y,s,f}}{\sum_a N_{y,s,a}s_{f,y,a}w_{f,y,a} + 10^{-6}}, \quad H_{y,s,f,a} = F_{y,s,f}s_{f,y,a}

For fisheries with removal_switch_f = 1, age-specific removals are:

H_{y,s,f,a} = \frac{C_{y,s,f}A_{y,s,f,a}}{N_{y,s,a}\left(\sum_a A_{y,s,f,a}w_{f,y,a} + 10^{-6}\right)}

with A_{y,s,f,a} from sliced age composition (af_sliced_ysfa).

Catch predictions:

\hat{C}^{\text{numbers}}_{f,y,a} = \sum_{s=1}^{2} H_{y,s,f,a}N_{y,s,a}

\hat{C}^{\text{weight}}_{y,s,f} = \sum_a H_{y,s,f,a}N_{y,s,a}w_{f,y,a}

A penalty term is accumulated from posfun(1 - \sum_f F_{y,s,f}, eps = 0.001) each season-year.

Spawning biomass is:

\text{SSB}_y = \sum_a N_{y,1,a} \phi_{y,a}

Stock-Recruitment

Implemented in: R/recruitment.R (get_recruitment, get_recruitment_prior, get_rho) and R/dynamics.R (get_initial_numbers).

Recruitment follows Beverton-Holt with a depensation term and lognormal deviation:

R_y = \frac{\alpha \,\text{SSB}_y}{\beta + \text{SSB}_y} \left(1 - \exp\left(\frac{\log(0.5)\,\text{SSB}_y}{\text{sr\_dep}\,B_0}\right)\right) \exp(\varepsilon_y - 0.5\sigma_r^2)

In R/recruitment.R, sr_dep defaults to 1e-10.

Initial equilibrium quantities from get_initial_numbers():

\text{rel}N_0=1,\quad \text{rel}N_a=\text{rel}N_{a-1}\exp(-M_{a-1}),\quad \text{rel}N_A=\frac{\text{rel}N_A}{1-\exp(-M_A)}

R_0 = \frac{B_0}{\sum_a \phi_{1,a}\text{rel}N_a},\quad \alpha=\frac{4hR_0}{5h-1},\quad \beta=\frac{B_0(1-h)}{5h-1},\quad N^{\text{init}}_a = R_0\text{rel}N_a

Recruitment-deviation prior in get_recruitment_prior():

\varepsilon_{1:(Y-3)} \sim N(0,\sigma_r^2), \quad \varepsilon_{(Y-2):Y} \sim \text{AR1}(\phi, \sigma_r)

where \phi is estimated by get_rho() as the empirical correlation of \epsilon_y and \epsilon_{y+1} over years 1965 to the terminal year - 5.

Selectivity

Implemented in: R/selectivity.R (get_selectivity, get_selectivity_prior) and called from R/model.R (sbt_model).

Selectivity is estimated for seven blocks (six fisheries plus CPUE). For fishery f, let a^{\min}_f and a^{\max}_f be the estimated age range, and let c_f be years where sel_change_year_fy[f,y] = 1.

At each change year y \in c_f, unnormalized selectivity is:

s'_{f,y,a} = \exp(\lambda_{f,r,a}), \quad a \in [a^{\min}_f, a^{\max}_f]

and is normalized by the mean over the estimated age range:

s_{f,y,a} = \frac{s'_{f,y,a}}{\text{mean}_{a^{\min}_f:a^{\max}_f}(s'_{f,y,a})}

For ages above a^{\max}_f, if sel_end_f[f] = 1 then s_{f,y,a}=s_{f,y,a^{\max}_f}, otherwise these ages remain zero. For years without a change point, selectivity is copied forward from the previous year.

The prior for each fishery’s log-selectivity matrix \Lambda_f is separable AR1 across change-year and age dimensions:

\log p(\Lambda_f) = \log \left[\text{dseparable}\left(\text{AR1}(\rho^y_f), \text{AR1}(\rho^a_f)\right)\right]

with scale

\sigma_f = \frac{\exp(\eta_f)}{\sqrt{1-(\rho^y_f)^2}\sqrt{1-(\rho^a_f)^2}}

where \eta_f = \text{par\_log\_sel\_sigma}[f].

A more complete explanation perhaps is detailed as follows:

Selectivity Prior (Separable AR1)

For fishery f, let the log-selectivity parameters be

\Lambda_f = \{ \lambda_{r,a} \},

where r indexes selectivity change-years and a indexes age.

Vectorizing the matrix,

\mathrm{vec}(\Lambda_f) \sim \mathrm{MVN}(0, \Sigma_f).

We assume a separable covariance structure:

\Sigma_f = \sigma_f^2 \left( \Sigma^{(y)}_f \otimes \Sigma^{(a)}_f \right),

where \otimes denotes the Kronecker product.

AR1 Structure Across Change-Years

For change-years,

\text{Cov}(\lambda_{r,a}, \lambda_{r',a}) = \sigma_f^2 \, (\rho^y_f)^{|r-r'|}.

AR1 Structure Across Ages

For ages,

\text{Cov}(\lambda_{r,a}, \lambda_{r,a'}) = \sigma_f^2 \, (\rho^a_f)^{|a-a'|}.

Combined Separable Covariance

Under separability,

\text{Cov}(\lambda_{r,a}, \lambda_{r',a'}) = \sigma_f^2 \, (\rho^y_f)^{|r-r'|} (\rho^a_f)^{|a-a'|}.

Scale Parameter

The marginal standard deviation is defined as

\sigma_f = \frac{\exp(\eta_f)} {\sqrt{1-(\rho^y_f)^2} \sqrt{1-(\rho^a_f)^2}},

where

\eta_f = par \log sel \sigma[f].

Growth

Implemented in: R/length-weight.R (get_length_at_age, get_dl) and giR/dynamics.R (get_phi).

Mean length-at-age is time varying by year and season (\text{length\_mu}_{y,s,a}), and SD-at-age is fixed (\text{length\_sd}_a). Age-length keys (alk_ysal) and distribution weights (dl_yal) are computed from these inputs and used in composition likelihoods.

Spawning output-at-age in year y (get_phi) is:

\phi_{y,a} = \frac{\sum_l d_{y,a,l}\,l^{3\psi}\,\text{mat}(l)}{\phi^*_{y,A}}

with maturity-at-length

\text{mat}(l)=\frac{1}{1 + 19^{(L_{50}-l)/(L_{95}-L_{50})}}

and normalization by the oldest age value \phi^*_{y,A}.

Natural Mortality

Implemented in: R/natural-mortality.R (get_M) and called from R/model.R (sbt_model).

Natural mortality-at-age is generated by get_M(min_age, max_age, age_increase_M, m0, m4, m10, m30).

Define:

p = \frac{\log\left(\frac{m_0 - m_4}{m_0 - m_{10}}\right)}{\log(1/3)}, \quad \Delta_M = \frac{m_{30} - m_{10}}{A - a_{\text{inc}}}

where A is the maximum age and a_{\text{inc}} is age_increase_M.

Then:

M_a = \begin{cases} m_0 & a \in \{0,1\} \\ m_0 - (m_0 - m_{10})\left(\frac{a-1}{9}\right)^p & 2 \le a \le 9 \\ m_{10} & 10 \le a \le a_{\text{inc}} \\ M_{a-1} + \Delta_M & a_{\text{inc}} < a < A \\ m_{30} & a = A \end{cases}

In the current RTMB implementation, all four mortality parameters are estimable unless mapped out in get_map().

Tagging Model

Implemented in: R/likelihoods.R (get_tag_like) and called from R/model.R (sbt_model).

Tag recapture data are modeled by cohort index k, tagger group t, release age i, and recapture age j (get_tag_like).

Adjusted releases (same-year recaptures removed and expanded for reporting):

N^{\text{adj}}_{k,t,i} = N_{k,t,i} - \frac{R_{k,t,i,i}}{\nu_{k+i-2,i}}

where \nu_{y,a} is the age- and year-specific reporting rate matrix (tag_rep_rates_ya).

To account for incomplete mixing in season 1, the tagged-fish harvest component is scaled by:

h^+_{k,j} = H_{\text{tag}}\,h_{y,1,j+1}, \quad y = \text{minK} + k - 1 + j

with H_{\text{tag}} = \exp(\text{par\_log\_tag\_H\_factor}).

Define survival and recapture terms:

S^{(1)}_{k,t,j} = (1-h^+_{k,j})(1-h_{y,2,j+1})\exp(-M_{j+1}-\Omega_t) S^{(2)}_{k,t,j} = (1-h^+_{k,j})(1-h_{y,2,j+1})\exp(-M_{j+1}-2\Omega_t) f^{(1)}_{k,t,j} = h^+_{k,j} + (1-h^+_{k,j})\exp\{-0.5(M_{j+1}+\Omega_t)\}h_{y,2,j+1} f^{(2)}_{k,t,j} = h^+_{k,j} + (1-h^+_{k,j})\exp\{-0.5(M_{j+1}+2\Omega_t)\}h_{y,2,j+1}

For the year immediately after release:

S^{\ast(1)}_{k,t,i} = \exp(-M_{i+1}-\Omega_t), \quad S^{\ast(2)}_{k,t,i} = \exp(-M_{i+1}-2\Omega_t)

Recapture probability (at least one tag returned):

p_{k,t,i,j} = \begin{cases} \left(2\xi_t S^{\ast(1)}_{k,t,i}f^{(1)}_{k,t,j} - \xi_t^2 S^{\ast(2)}_{k,t,i}f^{(2)}_{k,t,j}\right)\nu_{k+j-2,j}, & j=i+1 \\ \left(2\xi_t S^{\ast(1)}_{k,t,i}\prod_{s=i+1}^{j-1}S^{(1)}_{k,t,s}f^{(1)}_{k,t,j} - \xi_t^2 S^{\ast(2)}_{k,t,i}\prod_{s=i+1}^{j-1}S^{(2)}_{k,t,s}f^{(2)}_{k,t,j}\right)\nu_{k+j-2,j}, & j>i+1 \end{cases}

where \xi_t and \Omega_t correspond to immediate and continuous tag shedding (tag_shed_immediate, tag_shed_continuous).

Predicted recaptures are:

\hat{R}_{k,t,i,j} = N^{\text{adj}}_{k,t,i} p_{k,t,i,j}

Catch underreporting

Implemented in data preprocessing: R/get-data.R (get_data).

In the current RTMB workflow, catch adjustments are handled in get_data() using scenario multipliers (catch_LL1_case, catch_surf_case) plus additional historical undercatch components (catch_UA). There is no active standalone I42-style underreporting switch in R/model.R.

Predicted quantities

Implemented across R/dynamics.R and R/likelihoods.R; section-specific mappings are listed below.

Catch-at-age and catch-at-length

Implemented in: R/dynamics.R (do_dynamics) for catch predictions, and R/likelihoods.R (get_age_like, get_length_like) for observation models.

Predicted catch-at-age used by age and length likelihoods is accumulated across seasons:

\hat{C}^{\text{num}}_{f,y,a} = \sum_{s=1}^{2} H_{y,s,f,a}N_{y,s,a}

For length data, predicted proportions are generated using ALKs:

\hat{L}_{i,l} \propto \sum_a \hat{C}^{\text{num}}_{f_i,y_i,a}\,\text{ALK}_{y_i,s_i,a,l}

with lower bins optionally pooled via lf_minbin.

CPUE

Implemented in: R/likelihoods.R (get_cpue_like).

The CPUE likelihood uses fishery-7 selectivity and season-2 abundance (get_cpue_like). Define a creep adjustment:

a_1 = a_{\text{init}}, \quad a_i = a_{i-1} + \delta_{\text{cpue}} \; (i>1)

and effective abundance index:

\tilde{N}_i = \frac{\sum_{a=4}^{A} s_{7,y_i,a}N_{y_i,2,a}}{\text{mean}(s_{7,y_i,a_1:a_2})}

Unscaled log prediction:

\eta_i = \log(a_i) + \omega_{\text{cpue}}\log(\tilde{N}_i)

The series is centered and shifted by q:

\log \hat{I}_i = \eta_i - \log\left(\frac{1}{n}\sum_{i=1}^n e^{\eta_i}\right) + \log q_{\text{cpue}}

where \omega_{\text{cpue}} = \exp(\text{par\_log\_cpue\_omega}).

Tag returns

Implemented in: R/likelihoods.R (get_tag_like).

Predicted tag returns are:

\hat{R}_{k,t,i,j} = N^{\text{adj}}_{k,t,i} p_{k,t,i,j}

using the recapture probabilities defined in the Tagging section.

Aerial survey

Implemented in: R/likelihoods.R (get_aerial_survey_like).

Legacy data note: the aerial survey is retained because it is still implemented in the codebase, but it is treated as a legacy data type.

The unscaled aerial index for observation year y_i is:

\tilde{I}_i = \sum_{a=2}^{4} s^{\text{aerial}}_a\,w_{6,y_i,a}\,N_{y_i,1,a}

Selectivity vector s^{\text{aerial}} depends on aerial_switch:

0: (0.5,1,1), likelihood off
1: (\exp(\theta_1),1,\exp(\theta_2))
2: (1,1,1)
3: (0.33,1,0.33)
4: (0.5,1,1)

A scaling parameter is estimated analytically each evaluation and applied to \tilde{I}_i.

Trolling survey

Implemented in: R/likelihoods.R (get_troll_like).

The trolling index is based on age-1 abundance in season 1:

\tilde{T}_i = N_{y_i,1,1}

A weighted least-squares scale factor is estimated:

q_{\text{troll}} = \frac{\sum_i \tilde{T}_i\,O_i/\sigma_i^2}{\sum_i \tilde{T}_i^2/\sigma_i^2}, \quad \sigma_i^2 = \text{SD}_{i}^2 + \tau_{\text{troll}}^2

and predictions are \hat{T}_i = q_{\text{troll}}\tilde{T}_i.

Objective Function

Implemented in: R/model.R (sbt_model), summing likelihood and prior components from helper functions.

The implemented objective in sbt_model() is:

\begin{aligned} \text{nll} = {}& lp_{\text{prior}} + \sum_f lp_{\text{sel},f} + lp_{\text{rec}} + lp_{\text{penalty}} \\ &+ \sum_i lp_{\text{af},i} + \sum_i lp_{\text{lf},i} + \sum_i lp_{\text{cpue\_lf},i} \\ &+ \sum_i lp_{\text{cpue},i} + \sum_i lp_{\text{aerial},i} + lp_{\text{aerial\_tau}} \\ &+ \sum_i lp_{\text{troll},i} + \sum_i lp_{\text{tag},i} + \sum_i lp_{\text{pop},i} \\ &+ \sum_i lp_{\text{hsp},i} + \sum_i lp_{\text{gt},i} \end{aligned}

CPUE Data

Implemented in: R/likelihoods.R (get_cpue_like).

Observed CPUE is modeled on log scale with year-specific SD:

\log O_i \sim N\left(\log \hat{I}_i,\; \sigma_i\right), \quad \sigma_i = \sqrt{\text{cpue}_\sigma_i^2 + \sigma_{\text{cpue}}^2}

where \sigma_{\text{cpue}} = \exp(\text{par\_log\_cpue\_sigma}).

Catch-at-age and Catch-at-length

Implemented in: R/likelihoods.R (get_age_like, get_length_like, get_cpue_length_like).

For fisheries with removal_switch_f = 0, composition likelihoods support four options:

switch = 1: multinomial (dmultinom)
switch = 2: Dirichlet (ddirichlet)
switch = 3: Dirichlet-multinomial (ddirmult)
switch = 9: multinomial/KL approximation

Observed and predicted proportions are normalized within observation, with end bins pooled as defined in get_age_like() and get_length_like().

Tag returns likelihood

Implemented in: R/likelihoods.R (get_tag_like).

For each non-empty release set (k,t,i) with recapture ages j=i+1,\dots,J_k, the model uses a Dirichlet-multinomial form.

Define:

\omega_{k,t,i} = \max\left(\frac{N_{k,t,i} - \phi_{\text{tag}}}{\phi_{\text{tag}} - 1},\;10^{-3}\right)

with \phi_{\text{tag}} = \text{tag\_var\_factor}, and

R_{k,t,i,\neg} = N^{\text{adj}}_{k,t,i} - \sum_{j=i+1}^{J_k} R_{k,t,i,j}, \quad p_{k,t,i,\neg} = 1 - \sum_{j=i+1}^{J_k} p_{k,t,i,j}

Then (up to constants):

\begin{aligned} \log L_{k,t,i} = {}& \log\Gamma(\omega_{k,t,i}) - \log\Gamma(N^{\text{adj}}_{k,t,i}+\omega_{k,t,i}) \\ &+ \sum_{j=i+1}^{J_k}\left[\log\Gamma(R_{k,t,i,j}+\omega_{k,t,i}p_{k,t,i,j}) - \log\Gamma(\omega_{k,t,i}p_{k,t,i,j})\right] \\ &+ \log\Gamma(R_{k,t,i,\neg}+\omega_{k,t,i}p_{k,t,i,\neg}) - \log\Gamma(\omega_{k,t,i}p_{k,t,i,\neg}) \end{aligned}

and lp_{\text{tag}} = -\log L_{k,t,i}.

Aerial Survey Likelihood

Implemented in: R/likelihoods.R (get_aerial_survey_like).

Legacy data note: this likelihood remains documented for code fidelity, but the aerial stream is considered legacy in current use.

Let \Sigma = \text{aerial\_cov} + \tau_{\text{aerial}}^2 I, with \tau_{\text{aerial}} = \exp(\text{par\_log\_aerial\_tau}).

Using unscaled predictions \tilde{I}, define:

\log q_{\text{aerial}} = \frac{\mathbf{1}^T\Sigma^{-1}(\log O - \log \tilde{I})}{\mathbf{1}^T\Sigma^{-1}\mathbf{1}}

\log \hat{I} = \log \tilde{I} + \log q_{\text{aerial}}

and

\log O \sim \text{MVN}(\log \hat{I}, \Sigma)

In addition, the implementation reports and adds:

lp_{\text{aerial\_tau}} = 0.5\log|\Sigma|

Trolling Survey Likelihood

Implemented in: R/likelihoods.R (get_troll_like).

With \tau_{\text{troll}} = \exp(\text{par\_log\_troll\_tau}) and

\sigma_i = \sqrt{\sigma_i^2_\text{troll}} + \tau_{\text{troll}}^2}

the model fits

\log O_i \sim N(\log \hat{T}_i, \sigma_i), \quad \hat{T}_i = q_{\text{troll}}N_{y_i,1,1}

where q_{\text{troll}} is estimated analytically by weighted least squares each evaluation.

Close-kin Data

Implemented in data preparation and likelihood helpers: R/get-data.R (get_data), R/likelihoods.R (get_POP_like, get_HSP_like, get_GT_like).

The close-kin data are now an agreed dataset for use within the CCSBT Operating Model (OM). While the full dataset contains many details, for modeling purposes the simplified structure is as follows:

Juveniles: There are individuals i \in \Theta with a year of capture y_i, length-at-capture l_i, and either:
- A distribution of ages conditional on length, p(a_i \mid l_i) (via a growth relationship), or
- Direct age a_i (in the case of a known POP).
The cohort year of birth is calculated as c_i = y_i - a_i.
Adults: There are individuals j \in \Xi with a year of capture y_j, length-at-capture l_j, and either:
- A distribution p(a_j \mid l_j), or
- Direct age a_j (as in a POP match).

Defining the Parental Probability

Implemented in: R/likelihoods.R (get_POP_like).

The genetic data allow us to determine whether juvenile i and adult j form a Parent-Offspring Pair (POP) with high certainty. In a simple cartoon version, if there are N adults, the probability that any one is the parent of a juvenile is 2/N (either a mother or a father).

However, in a more realistic model with multiple cohorts and age structure, the following two factors must be considered:

Relative reproductive output varies with age (e.g., maturity, batch fecundity, spawning history).
Relative abundance decreases with age due to mortality and recruitment dynamics.

To compute the probability that adult j is the parent of juvenile i, we balance these effects. Let:

\quad c_i = y_i - a_i: birth year of juvenile i

\quad a_j - a_i: age of adult j in the birth year of juvenile i

\quad \varphi_{a_j - a_i}: reproductive output of the adult at that age

Then the parental probability \pi_{ij} is:

\pi_{ij} = \frac{2 \varphi_{a_j - a_i}}{ \sum_a N_{y_i - a, a} \varphi_a }

where N_{y,a} is the abundance of fish of age a in year y, and \sum_a N_{c_i,a} \varphi_a is the total reproductive output in the birth year of juvenile i.

Close-kin Likelihood

Implemented in: R/likelihoods.R (get_POP_like).

At the individual level, the likelihood of observing the parent-offspring pairs (POPs), denoted \Psi, is a product of independent Bernoulli distributions:

L(\Psi \mid \circ) = \prod_{i \in \Theta} \prod_{j \in \Xi} \pi_{ij}^{k_{ij}} (1 - \pi_{ij})^{1 - k_{ij}}

where k_{ij} = 1 if the \{i, j\} pair is a POP and 0 otherwise.

Given that the full comparison set includes millions of \{i, j\} combinations, this base-level likelihood is computationally infeasible. To simplify:

Group comparisons by:
1. Adult capture age a
2. Adult capture year y
3. Juvenile cohort year c

Let:

\quad \Omega_{c,y,a} be the number of comparisons in group (c, y, a)

\quad \Psi_{c,y,a} be the number of POPs observed in that group

Then the grouped likelihood becomes a binomial likelihood:

L(\Psi_{c,y,a} \mid \cdot) \propto \prod_{c} \prod_{y} \prod_{a} \pi_{c,y,a}^{\Psi_{c,y,a}} (1 - \pi_{c,y,a})^{\Omega_{c,y,a} - \Psi_{c,y,a}}

Standardised Residuals

Implemented status: conceptual diagnostic description; no dedicated close-kin residual helper in current RTMB source.

As in catch-at-age and tagging models, residuals help evaluate model fit.

Under the binomial assumption, the standardised residual for close-kin group (c, y, a) is:

\hat{\varepsilon}_{c,y,a} = \frac{ \Psi_{c,y,a} - \Omega_{c,y,a} \pi_{c,y,a} }{ \sqrt{ \Omega_{c,y,a} \pi_{c,y,a} (1 - \pi_{c,y,a}) } }

If the binomial assumption holds, \hat{\varepsilon}_{c,y,a} should follow a standard normal distribution.

Aggregated Residuals: Higher-Level Approximation

Implemented status: conceptual diagnostic description; no dedicated aggregated-residual helper in current RTMB source.

A key challenge with close-kin data is deciding the appropriate level to evaluate residuals. At the base level, individual parental probabilities \pi_{ij} are often extremely low, with many comparisons and very few POP detections (often only one or two). This structure is uncommon in ecological models, making standard residual analysis (e.g., expecting variance ≈ 1) potentially misleading.

To address this, one more nuanced approach is to aggregate residuals to a higher level — such as cohort c and adult capture year y — by summing across adult ages.

Using the normal approximation to the binomial, the expected number of POPs at this aggregated level is:

\hat{\mu}_{c,y} = \sum_a \Omega_{c,y,a} \pi_{c,y,a}

with corresponding variance:

\hat{\sigma}^2_{c,y} = \sum_a \Omega_{c,y,a} \pi_{c,y,a} (1 - \pi_{c,y,a})

This allows a higher-level standardized residual:

\hat{\varepsilon}_{c,y} = \frac{ \Psi_{c,y} - \hat{\mu}_{c,y} }{ \sqrt{ \hat{\sigma}^2_{c,y} } }

While this normal approximation is useful for visual diagnostics, it is not ideal with very small probabilities or uneven sample sizes. In such cases, it is advisable to compute the exact probability from the sum of binomial variables, especially if assessing how well the OM explains the close-kin data.

Half-Sibling Pairs (HSP)

Implemented in: R/likelihoods.R (get_HSP_like).

For each HSP data record, the model computes a probability from age-specific reproductive output and survival between cohorts (get_HSP_like):

\gamma_{y,a} = \frac{N_{y,2,a}\phi_{y,a}}{\text{SSB}_y}

and

p^{\text{HSP}}_i = \frac{4q_{\text{hsp}}}{\text{SSB}_{c_{\max}}} \sum_a \gamma_{c_{\min},a}\,\text{Surv}(a,c_{\max}-c_{\min})\,\phi_{c_{\max},a'}

with binomial likelihood

K_i \sim \text{Binomial}(N_i, p^{\text{HSP}}_i \times \text{hsp\_false\_negative})

where q_{\text{hsp}} = \exp(\text{par\_log\_hsp\_q}).

Gene Tagging (GT)

Implemented in: R/likelihoods.R (get_GT_like).

For each GT observation (get_GT_like):

p^{\text{GT}}_i = \frac{N^{\text{rel}}_i}{q_{\text{gt}}N_{y^{\text{rel}}_i,1,a^{\text{rel}}_i}}, \quad q_{\text{gt}} = \exp(\text{par\_log\_gt\_q})

Likelihood is binomial (gt_switch = 1) or beta-binomial (gt_switch = 2).

Prior distributions

Implemented in: R/priors.R (get_priors, evaluate_priors), R/recruitment.R (get_recruitment_prior), R/selectivity.R (get_selectivity_prior), and parameter mapping in R/parameters.R (get_map).

The prior term in the objective is:

lp_{\text{prior}} = -\sum_{m} \log p(\theta_m)

where active priors are defined in get_priors() (and may be changed in code). Current defaults are:

\text{par\_log\_psi} \sim N(\log(1.75), 0.122^2)
\text{par\_log\_m10} \sim N(\log(0.1), 0.06^2)
\text{par\_log\_h} \sim N(\log(1.0), 1.5^2)
\text{par\_log\_cpue\_omega} \sim N(0.875, 0.1^2)

In addition to lp_prior, two structured priors are always included:

Recruitment-deviation prior lp_rec from get_recruitment_prior()
Selectivity prior \sum_f lp_{\text{sel},f} from get_selectivity_prior()

Parameters can be fixed by mapping in get_map(), which removes them from estimation (and effectively from prior contribution).

MSY Calculation

Implemented in: R/msy.R (msy_calc, find_msy, read_msy_dat, write_msy_report, print_msy_report).

The current implementation solves an equilibrium two-season, multi-fishery MSY problem using fully-selected fishery mortalities F^{\text{full}}_f.

MSY State Equations

Implemented in: R/msy.R (msy_calc).

Fishery-at-age mortality is:

F_{f,a} = F^{\text{full}}_f s_{f,a}

Seasonal exploitation rates are accumulated as:

ER_{2,a} = \sum_{f=1}^{2} F_{f,a}, \qquad ER_{1,a} = \sum_{f=3}^{n_f} F_{f,a}

To avoid invalid exploitation rates near 1, each seasonal maximum is smoothed with a posfun-style adjustment. Let x_s = 1 - \max_a(ER_{s,a}) and \epsilon = 0.05. Then:

\text{sr\_adj}_s = \begin{cases} x_s, & x_s \ge \epsilon \\ \dfrac{\epsilon}{2 - x_s/\epsilon}, & x_s < \epsilon \end{cases}

\text{adj}_s = \frac{1-\text{sr\_adj}_s}{\max_a(ER_{s,a})}

and ER_{s,a} and F_{f,a} are rescaled by adj. The penalty contribution is:

\text{pen} = 10^6 \sum_s 0.01\,(x_s-\epsilon)^2 \, \mathbf{1}(x_s < \epsilon)

Relative equilibrium survivorship-per-recruit is:

Rel_0 = 1

Rel_a = Rel_{a-1}(1-ER_{1,a-1})(1-ER_{2,a-1})e^{-M_{a-1}}, \quad a=1,\dots,A

Plus-group correction:

Rel_A \leftarrow \frac{Rel_A}{1-(1-ER_{1,A})(1-ER_{2,A})e^{-M_A}}

Spawning biomass per recruit and equilibrium recruitment:

SBR = \sum_{a=0}^{A} Rel_a \phi_a

Rec = \max\left(1,\; \alpha - \frac{\beta}{SBR}\right)

Biomass metrics:

SSB^{\text{msy}} = Spbio = Rec \cdot SBR

Tbio = Rec \sum_{a=2}^{A} Rel_a \, spwt_a

Season-2-start abundance proxy for fisheries 1–4:

Rel2_a = Rel_a(1-ER_{1,a})e^{-0.5M_a}

Catch by fishery:

C_f = \begin{cases} \sum_a Rec \cdot Rel_a \cdot F_{f,a}\cdot cwt_{f,a}, & f \in \{5,6\} \\ \sum_a Rec \cdot Rel2_a \cdot F_{f,a}\cdot cwt_{f,a}, & f \in \{1,2,3,4\} \end{cases}

C^{\text{msy}} = Ctot = \sum_{f=1}^{n_f} C_f

Estimated fishery catch split:

\widehat{p}_f = \begin{cases} C_f/C^{\text{msy}}, & C^{\text{msy}} > 0\\ 0, & C^{\text{msy}} = 0 \end{cases}

Annual total mortality-at-age reported by the routine:

F^{\text{tot}}_a = -\log\left[(1-ER_{1,a})(1-ER_{2,a})\right]

Unfished spawning biomass per recruit:

Rel^0_0 = 1,\qquad Rel^0_a = Rel^0_{a-1}e^{-M_{a-1}}

Rel^0_A \leftarrow \frac{Rel^0_A}{1-e^{-M_A}},\qquad SBR_0 = \sum_{a=0}^{A} Rel^0_a \phi_a

MSY Optimization Objective

Implemented in: R/msy.R (find_msy).

find_msy() estimates F^{\text{full}} using bounded nlminb:

\min_{F^{\text{full}}_f \in [10^{-5},\,0.5]} \left[ -\log(C^{\text{MSY}}+1) + \lambda \sum_f (\widehat{p}_f - p_f^{\text{target}})^2 + \text{pen} \right]

with start values F^{\text{full}}=F_{\text{ini}} and default \lambda = 10^5.

Projection Model

Implemented in: R/projections.R (project_dynamics, project_selectivity, project_rec_devs) with core dynamics from R/dynamics.R (do_dynamics).

Model Structure

Implemented in: R/projections.R (project_dynamics) through projected catch/selectivity inputs to the dynamics engine.

Same as in the conditioning model, except that only four fisheries are included:

LL1 fishery (second season)
LL2 fishery (second season)
Indonesian spawning fishery (first season)
Australian surface fishery (first season)

Population Model

Implemented in: R/projections.R (project_dynamics) via calls to R/dynamics.R (do_dynamics).

Identical to the conditioning model.

Initial Abundances

Implemented in: R/projections.R (project_dynamics) using terminal reconstruction state and projected recruitment deviates from project_rec_devs.

Initial age-specific abundances at the start of the projection are estimated from the conditioning model. To represent process error in recruitment, lognormal autocorrelated error is added to initial abundances for ages 0–2.

Let Y_1 be the first projection year. Then:

N_{Y_1,4} = \hat{N}_{Y_1,4} \cdot \exp(0.4z - 0.08)

N_{Y_1,3} = \hat{N}_{Y_1,3} \cdot \exp(0.4z - 0.08)

N_{Y_1,2} = \hat{N}_{Y_1,2} \cdot \exp(\epsilon_{Y_1})

N_{Y_1,1} = \hat{N}_{Y_1,1} \cdot \exp(\hat{\rho} \epsilon_{Y_1-2} + \epsilon_{Y_1-1})

N_{Y_1,0} = \hat{N}_{Y_1,0} \cdot \exp(\hat{\rho}^2 \epsilon_{Y_1-2} + \hat{\rho} \epsilon_{Y_1-1} + \epsilon_{Y_1})

Where:

z \sim {N}(0,1)
\epsilon_y \sim {N}\left(0, (1 - \hat{\rho}^2) \sigma_R^2\right)
\sigma_R = 0.6 (and an extra lognormal SD = 0.4 is added for \hat{N}_{Y_1-4,0} and \hat{N}_{Y_1-3,0})

These imply:

\tau_{Y_1} = \hat{\tau}_{Y_1} + \hat{\rho}^2 \epsilon_{Y_1-2} + \hat{\rho} \epsilon_{Y_1-1} + \epsilon_{Y_1}

and subsequent years follow:

\tau_{Y_1+1} = \hat{\rho} \tau_{Y_1} + \epsilon_{Y_1+1}, \quad \text{etc.}

This formulation assumes autocorrelated recruitment starting in Y_1-2.
However, it may over-propagate uncertainty from Y_1-3 due to grid-based point estimates.
A sensitivity run with uncorrelated \tau_{Y_1-2} is used to address this.

Selectivity in Projections

Implemented in: R/projections.R (project_selectivity).

Unlike the conditioning model, the projection model does not use random-walk selectivity due to the risk of unrealistic drift. Instead, it starts from the mean of the last 3 estimates and adds autocorrelated process error:

For fishery 1:

s_{1,a,y+1} = s_{1,a,y} \cdot \exp(\varepsilon_{a,y}) \quad \text{for } a_1^{\min,s} \le a \le a_1^{\max,s}, \text{ where } a_1^{\min,s} = 2, a_1^{\max,s} = 17

Error terms follow:

\varepsilon_{a,y} = \rho_{\text{sel}} \varepsilon_{a,y-1} + \sqrt{1 - \rho_{\text{sel}}^2} \, \eta_{a,y}, \quad \eta_{a,y} \sim {N}(0, 0.05^2), \quad \rho_{\text{sel}} = 0.7

Selectivities update annually (not in blocks, as in the conditioning model).

Australian Surface Fishery (Special Case)

Implemented status: legacy/ADMB-style narrative; no dedicated age-3 boost rule in current RTMB projection helpers (R/projections.R).

Let:

P_{y,3} = \frac{N_{y,3}}{\sum_{a=1}^5 N_{y,a}}, \quad \bar{P}_3 = \frac{1}{10} \sum_{y=y_{n2}-9}^{y_{n2}} P_{y,3}

If P_{y,3} \ge \bar{P}_3:

s_{6,y,a} = s_{6,y-1,a} \cdot \exp(\varepsilon_{6,y,a}) \quad \text{for } a = 1, \ldots, 5, \quad \varepsilon_{6,y,a} \sim {N}(0, 0.1^2)

Otherwise, boost selectivity at age 3:

s_{6,y,3} = s_{6,y-1,3} \cdot \left(1 + 0.5 \cdot \frac{\bar{P}_3 - P_{y,3}}{P_{y,3}} \right), \quad s_{6,y,a} = s_{6,y-1,a} \cdot \exp(\varepsilon_{6,y,a}) \text{ for } a = 1,2,4,5

Fishing Mortality in Projections

Implemented in: R/projections.R (project_dynamics) via R/dynamics.R (get_harvest_rate, do_dynamics).

Year subscripts are omitted for simplicity. Catch is defined as:

C = \sum_a \sum_f s_{f,a} F_f N_a

For each fishery f:

C = \sum_f C_f, \quad \text{where } C_f = \left( \sum_a s_{f,a} N_a \right) F_f

Solving for F_f:

F_f = \frac{C_f}{\sum_a s_{f,a} N_a}

Additionally:

Catch-at-age: C_{f,a} = s_{f,a} F_f N_a
Catch at age: C_a = \left( \sum_f s_{f,a} F_f \right) N_a

Note: A problem occurs if \sum_f s_{f,a} F_f > 1, which implies C_a > N_a.

Harvest Rate Bounding and Selectivity Adjustment

Implemented status: partial. Current RTMB uses posfun penalty in R/dynamics.R (get_harvest_rate); explicit A1-A6 rescaling routine is legacy narrative.

In early MP trials, the model enforced a hard upper bound on age-specific exploitation rates:

\sum_f s_{f,a} F_f \le 0.99

When this bound was exceeded, the model reduced the catch at that age, but not the selectivity or fishing mortality for other ages, which led to underutilization of TAC.

An improved method adjusts selectivities at the age(s) where the bound is exceeded (from MP Workshop II, Queenstown, April 2003).

Case of One Fleet (or Non-Overlapping Selectivities)

Implemented status: legacy/ADMB analytical description; no dedicated standalone helper in current RTMB source.

Assume a single fleet. If F \le 0.9, no adjustment is needed. If F > 0.9, apply:

C = \sum_a g(s_a F) N_a \tag{A1}

where the adjusted selectivity is:

s_a^* = \frac{g(s_a F)}{F} \tag{A2}

The proposed function g(x) is:

g(x) = \begin{cases} x & \text{if } x \le 0.9 \\ 0.9 + 0.08 \left[ 1 - \exp(-10(x - 0.9)) \right] & \text{if } x > 0.9 \end{cases} \tag{A3}

Notes: - g(x) < 1 ensures C_a = g(s_a F) N_a < N_a - g(x) is continuous and differentiable at x = 0.9 - Newton-Raphson is used to solve equation (A1) for F

Extension to Multiple Fleets

Implemented status: partially represented by multi-fleet harvest in R/dynamics.R (get_harvest_rate); the explicit A4-A6 derivation is legacy documentation.

If:

\sum_f s_{f,a} F_f \le 0.9 \quad \text{for all } a

then proceed as normal. Otherwise, for any age a:

C = \sum_a g\left(\sum_f s_{f,a} F_f\right) N_a \tag{A4}

To proportionally reduce selectivity for each fleet at age a, define:

s^*_{f,a} = s_{f,a} \cdot \left[ \frac{g\left( \sum_f s_{f,a} F_f \right)}{\sum_f s_{f,a} F_f} \right] \tag{A5}

Then:

C_{f,a} = s^*_{f,a} F_f N_a = s_{f,a} F_f N_a \cdot \left[ \frac{g\left( \sum_f s_{f,a} F_f \right)}{\sum_f s_{f,a} F_f} \right] \tag{A6}

This guarantees:

C_a = \sum_f C_{f,a} = g\left( \sum_f s_{f,a} F_f \right) N_a < N_a

Solving for F_f When Bounds Are Exceeded

Implemented status: current RTMB computes F_f algebraically by fishery in R/dynamics.R (get_harvest_rate) and applies posfun penalty rather than an explicit multivariate root-solver.

Once selectivity is adjusted via:

C_a = \sum_f C_{f,a} = \sum_f s^*_{f,a} F_f N_a = g\left(\sum_f s_{f,a} F_f \right) N_a

Then the coupled non-linear equations must be solved for each F_f:

C_f = \sum_a C_{f,a} = \sum_a s^*_{f,a} F_f N_a

Substituting s^*_{f,a} gives:

C_f = \sum_a s_{f,a} F_f \left[ \frac{g\left( \sum_{f'} s_{f',a} F_{f'} \right)}{\sum_{f'} s_{f',a} F_{f'}} \right] N_a

This system is solved via a multivariate root-finding method, e.g., extended Newton-Raphson.

CPUE and Aerial Survey Deviations

Implemented status: partial. CPUE projection deviations are simulated in R/projections.R (project_dynamics) using R/likelihoods.R (get_cpue_like); aerial-deviation text below is legacy narrative.

CPUE Deviations

Implemented in: R/projections.R (project_dynamics) plus CPUE predictor from R/likelihoods.R (get_cpue_like).

CPUE is simulated with autocorrelated trends in catchability, superimposed on a 0.5% annual increase in q.

Empirical autocorrelation \rho_{\text{cpue}} from the 1969–2008 time series is used
For standard deviation, choose \max(0.2, \text{empirical SD})
Override option: pass a value via -cpuestd xx in the projection code

Aerial Survey Deviations

Implemented status: legacy narrative; no dedicated aerial-deviation simulation helper in current R/projections.R.

Aerial indices are simulated by adding lognormal deviations. Initially, empirical variance and autocorrelation were used, but:

When \tau_{\text{aerial}} was estimated as free, it produced high SDs (0.50–0.60)
Residual autocorrelation was very low

To stabilize, \tau_{\text{aerial}} was fixed to 0.18, which with sampling error gives SD \approx 0.30 in log space.

Simulation Grid for Deviation Parameters

Implemented status: scenario documentation; no dedicated grid helper for this table in current RTMB code.

Scenario	SD (CPUE)	\rho_{\text{cpue}}
Base	0.20	Empirical, correlated to last residual
highCpueCV	0.30	Empirical, correlated to last residual

For the aerial survey, autocorrelation is set to zero due to short residual history and poor reliability.

Lags in Data Availability and Schedules of TAC Changes

Implemented status: policy/process documentation; not encoded as executable scheduling logic in current RTMB source.

When a TAC for year y is determined, the projection code assumes the following data are available:

Catch data up to year y - 3
TAC up to year y - 1
CPUE up to year y - 3
Age-composition data up to year y - 3
Gene tagging index (age-2 abundance) in year y - 4
Pop-HSP index of SSB for year y - 7

Options for TAC Update Frequency

Implemented status: policy documentation; not represented by a dedicated function in R/*.R.

OMMP8 recommended calculating the first TAC under the new MP in 2020 (for implementation in 2021), to allow more time for stakeholder consultation. This skips the usual lag for only the first decision.

The control file includes four TAC update schedule options:

Option (a): First TAC in 2021, then every year — no lag
Option (b): First TAC in 2021, then every other year — no lag
Option (c): First TAC in 2021, then every 3 years — no lag
Option (d): First TAC in 2021, then every 3 years — with one-year lag

Example Schedule: Options (c) and (d)

Implemented status: illustrative schedule table; not generated by a dedicated function in current RTMB source.

In all cases, the first TAC is calculated in 2019, using:

CPUE: up to 2018
Gene tagging: 2016, 2017
POP/HSP SSB: up to 2014

Decision Year	Catch Data	CPUE Data	Gene Tagging	POP/HSP SSB	TAC Year	TAC Change?
2017	2016	2016	—	2012	2019	hardwired
2018	2017	2017	2016	2013	2020	hardwired
2019	2018	2018	2017	2014	2021	Yes
2020	2019	2019	2018	2015	2020	No
2021	2020	2020	2019	2016	2021	No
2022	2021	2021	2020	2017	2022	Yes
2023	2022	2022	2021	2018	2023	No

Note: Shaded years (2016–2018) represent real data available in 2019 (not OM-simulated).

Tuning Levels

Implemented status: policy/tuning documentation; not encoded in a dedicated tuning module in current RTMB source.

The SFMWG proposed six tuning options based on the probability of rebuilding the stock to 20% of SSB_0 in 25 or 30 years. Short-term checkpoints were also set at 12 or 15 years from present, to check the probability of rebuilding to 10% of SSB_0 or to 2 \times SSB_{2009}.

Tuning Options and Checkpoint Years

Implemented status: documentation table; not generated by a dedicated function in current RTMB source.

Tuning Level	Tuning Year	P[SSB ≥ 0.2 SSB_0]	Short-term Checkpoint Year
1	2035	0.60	2022
2	2035	0.70	2022
3	2035	0.90	2022
4	2040	0.60	2025
5	2040	0.70	2025
6	2040	0.90	2025

Maximum–Minimum TAC Changes

Implemented status: policy constraint discussion; bounds can be explored by user-specified projection inputs, not by a dedicated hardcoded helper.

The SFMWG recommended examining two options for TAC change bounds:

Maximum change: 3000 t or 5000 t
Minimum change: 100 t

These are not hardcoded, so users can explore alternative values.
However, prior analyses indicated that these bounds act as binding constraints, especially at high tuning levels where the TAC was often pushed against the maximum limit.

Appendix: Transition from ADMB to RTMB

Implemented in: R/parameters.R (get_map), R/model.R (sbt_model), and workflow vignettes/scripts that construct MakeADFun objects.

This model was originally implemented in ADMB using control file flags. These have been replaced in RTMB with mapped parameters and script-based control. Key transitions include:

ADMB Flag/Control Concept	RTMB Equivalent
`flag_rec_devs = 1`	`rec_devs` not mapped (`map = list(rec_devs = ...)`)
`flag_est_q = 0`	`map = list(log_q = factor(NA))`
Estimation phase definitions	Controlled in R script via `parameters` and `map`
Profile runs via `-mcmc`	Handled via `tmbstan` or `adnuts` in R

This makes the model setup more reproducible and scriptable, and decouples the estimation logic from text-based control files.