Read replica reconcile misclassifies existing replicas and keeps rejoined replicas NotReady

### Description

We hit two related bugs with async read replicas managed by the Kubernetes operator:

1. An existing read replica that is already present in InnoDB Cluster metadata can be classified as `JOINABLE` instead of `REJOINABLE`, so the operator calls `Cluster.addReplicaInstance()` and gets:
   `MYSQLSH 51305: Target instance already part of this InnoDB Cluster`

2. After the replica is manually recovered with `Cluster.rejoinInstance()`, the operator still writes pod annotation `mysql.oracle.com/membership-info` as `status=OFFLINE` and flips readiness gate `mysql.oracle.com/ready=False`, even though `Cluster.status()` reports the read replica `ONLINE`.

This leaves the read replica pod `Running` but not `Ready`, so Services without `publishNotReadyAddresses` lose their endpoints.

### Operator / server versions

- Helm chart: `mysql-operator` `2.2.8`
- Operator image: `container-registry.oracle.com/mysql/community-operator:9.7.0-2.2.8`
- MySQL server image: `container-registry.oracle.com/mysql/community-server:9.6.0`
- InnoDBCluster API: `mysql.oracle.com/v2`

### Cluster shape

- 1 group-member primary
- 1 read replica
- `router.instances = 0`

### What we observed

- `Cluster.status({extended:1})` from the primary shows the read replica under `defaultReplicaSet.topology.<primary>.readReplicas` with:
  - before manual recovery: `status: OFFLINE`, `instanceErrors: ["WARNING: Read Replica's replication channel is stopped. Use Cluster.rejoinInstance() to restore it."]`
  - after manual recovery: `status: ONLINE`
- While the replica was OFFLINE, the operator repeatedly logged:

```text
Setting up '...-rr-0...:3306' as a Read Replica of Cluster '...'
ERROR: The instance '...-rr-0...:3306' is already part of this Cluster. A new Read-Replica must be created on a standalone instance.
MYSQLSH 51305: Target instance already part of this InnoDB Cluster
```

- After manual `Cluster.rejoinInstance()`, replication recovered (`Replica_IO_Running=Yes`, `Replica_SQL_Running=Yes`, `Seconds_Behind_Source=0`) and `Cluster.status()` reported the read replica `ONLINE`, but the operator still kept the pod readiness gate false until we manually patched pod status.

### Suspected root cause

There seem to be two read-replica-specific assumptions in the controller:

1. `diagnose_cluster_candidate()` checks membership with `cluster.status()["defaultReplicaSet"]["topology"].keys()`.
   This only covers GR members. Read replicas live under `topology[*]["readReplicas"]`, so an existing OFFLINE read replica can be treated as not-a-member and become `JOINABLE` instead of `REJOINABLE`.

2. `probe_member_status()` uses `shellutils.query_membership_info()`, which only queries `performance_schema.replication_group_members`.
   Async read replicas are not GR members, so this returns no row and falls back to `status="OFFLINE"`. That value is then written into:
   - pod annotation `mysql.oracle.com/membership-info`
   - readiness gate `mysql.oracle.com/ready=False`

In `trunk`, these paths still appear unchanged in:
- `mysqloperator/controller/diagnose.py`
- `mysqloperator/controller/innodbcluster/cluster_controller.py`
- `mysqloperator/controller/shellutils.py`

### Minimal reproduction

1. Deploy an `InnoDBCluster` with:
   - `instances: 1`
   - one `readReplicas` entry with `instances: 1`
   - `router.instances: 0`
2. Stop the async read replica channel on the read replica:
   `STOP REPLICA FOR CHANNEL 'read_replica_replication';`
3. Trigger operator reconciliation for the read replica pod (for example, recreate the pod or let the pod create handler run).
4. Observe the operator tries `addReplicaInstance()` and gets `MYSQLSH 51305`.
5. Manually run `Cluster.rejoinInstance('<rr-endpoint>')`.
6. Observe `Cluster.status()` shows the replica `ONLINE`, but the pod annotation/readiness stays `OFFLINE` / NotReady.

### Expected behavior

- Existing read replicas that are already present in cluster metadata should be classified as `REJOINABLE`, not `JOINABLE`.
- After a successful rejoin, the operator should update read replica membership/readiness using a read-replica-aware status source, and the pod should become `Ready`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read replica reconcile misclassifies existing replicas and keeps rejoined replicas NotReady #47

Description

Operator / server versions

Cluster shape

What we observed

Suspected root cause

Minimal reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Read replica reconcile misclassifies existing replicas and keeps rejoined replicas NotReady #47

Description

Description

Operator / server versions

Cluster shape

What we observed

Suspected root cause

Minimal reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions