Skip to content

Audio/Video Calls

WebRTC 1:1 audio/video calls with end-to-end encryption via Insertable Streams. SDK is transport-agnostic for TURN: any TURN provider returning standard iceServers JSON works (Cloudflare Realtime TURN, Twilio NTS, Metered, self-hosted coturn). The relay server's /api/v1/calls/ice-config is the single injection point — clients never hard-code TURN.

Initialize Call Module

The call module needs your Ed25519 identity keys (for signing outgoing signaling) and your aliasId (for envelope routing). Initialize after authentication:

ts
import { loadIdentity, deriveIdentity } from '@daomessage_sdk/sdk'

const stored = await loadIdentity()
const identity = deriveIdentity(stored.mnemonic)

client.initCalls({
  signingPrivKey: identity.signingKey.privateKey,
  signingPubKey:  identity.signingKey.publicKey,
  myAliasId:      stored.aliasId,
  // alwaysRelay: default FALSE since 1.0.11
  //   - false  → iceTransportPolicy='all' (host/srflx/relay all allowed)
  //              P2P 直连时不产生 TURN 带宽费用,两端公网 IP 会互相可见
  //   - true   → iceTransportPolicy='relay' (强制 TURN 中继,隐藏 IP)
  //              付费隐私模式,每小时 ~100MB 音频带宽计费
  // alwaysRelay: true,
})

Make a Call

ts
// Audio call
await client.calls.call('u87654321', { audio: true, video: false })

// Video call
await client.calls.call('u87654321', { audio: true, video: true })

Receive a Call

ts
// 1.0.12+  回调携带 isVideo 参数,UI 层据此选择响铃界面(音频/视频)
client.calls.onIncomingCall = (fromAlias, isVideo) => {
  showIncomingCallDialog(fromAlias, isVideo)
}

// User accepts
await client.calls.answer()

// User rejects
client.calls.reject()

isVideo 由 offer SDP 的 m=video 行自动判断,不依赖发起方声明。 对方一旦在 offer 里包含视频轨,isVideo=true;纯音频 offer 则为 false

1.0.11 起推荐使用 observable 订阅,避免 React / Vue 里 ref 赋值的时序竞态:

ts
// 订阅通话状态
const stateSub = client.calls.observeState().subscribe(state => {
  // 'idle' | 'calling' | 'ringing' | 'connecting' | 'connected'
  // | 'hangup' | 'rejected' | 'ended'
  updateCallUI(state)
})

// 订阅本地/远端流
const localSub = client.calls.observeLocalStream().subscribe(stream => {
  if (stream && localVideoRef.current) localVideoRef.current.srcObject = stream
})
const remoteSub = client.calls.observeRemoteStream().subscribe(stream => {
  if (stream && remoteVideoRef.current) remoteVideoRef.current.srcObject = stream
})

// 退订
stateSub.unsubscribe()
localSub.unsubscribe()
remoteSub.unsubscribe()

为什么用 observable 而不是 onLocalStream 回调: answer() 里的 getUserMedia 会极快 resolve(<100ms),如果 UI 用 mod.onLocalStream = (s) => ref.current.srcObject = s 给回调赋值, 流到达时 React video 元素可能还没 mount,ref.currentnull → 流被静默丢弃 → 本地小窗空白。observable 订阅是 hot stream,任何时候 订阅都能拿到当前值,React 重渲染时重新挂载 ref 也能立刻同步。

回调式 API(onLocalStream / onRemoteStream)保留作向后兼容,但不推荐。

Hang Up

ts
client.calls.hangup()

Error Handling

ts
client.calls.onError = (err) => {
  console.error('[Calls]', err.name, err.message)
}

常见错误:

  • NotAllowedError — 用户拒绝麦克风/摄像头权限
  • getUserMedia timeout after 6000ms — 1.0.3+ 增加的超时保护。Android Chrome 某些场景下 gUM 既不 resolve 也不 reject,SDK 自动降级到音频-only,再失败抛错
  • answer() called while already answering — 1.0.10+ 的防重入锁触发, UI 应 disable 接听按钮防连点

Call Flow

Caller                    Relay Server                   Callee
  │                           │                            │
  │── call_offer (SDP+sig) ──▶│── call_offer ──────────────▶│
  │                           │                            │
  │◀── call_answer (SDP) ─────│◀── call_answer (SDP+sig) ──│
  │                           │                            │
  │◀──────── ICE candidates ──│◀── ICE candidates ─────────│
  │── ICE candidates ────────▶│── ICE candidates ──────────▶│
  │                           │                            │
  │◀═══════ WebRTC P2P (or TURN relay) ═══════════════════▶│

Relay server 只做盲转发,不解密 payload,不解析 SDP(E2EE 保障)。

ICE Configuration (GET /api/v1/calls/ice-config)

Response contract

标准 WebRTC RTCConfiguration.iceServers 兼容格式。任何实现这个契约的 后端都能替换 relay-server 默认的 TURN provider:

json
{
  "ttl": 600,
  "ice_transport_policy": "all",
  "ice_servers": [
    { "urls": ["stun:turn.example.com:3478"] },
    {
      "urls": [
        "turn:turn.example.com:3478?transport=udp",
        "turn:turn.example.com:3478?transport=tcp",
        "turns:turn.example.com:5349?transport=tcp",
        "turns:turn.example.com:443?transport=tcp"
      ],
      "username":   "<ephemeral>",
      "credential": "<ephemeral>"
    }
  ]
}
  • ttl — credentials 有效期秒数,SDK 用它做本地缓存
  • ice_transport_policy"all"(P2P + TURN 兜底) / "relay"(强制 TURN)
  • 每个 ice_servers[*] 标准 WebRTC 结构,客户端零感知后端是哪家 TURN 商

推荐后端:Cloudflare Realtime TURN

1.0.11+ 官方参考实现走 Cloudflare。你自建 relay 的时候,推荐这条最省事的路:

bash
# 1. 在 Cloudflare Dashboard 开通 Realtime
#    https://dash.cloudflare.com/?to=/:account/realtime/turn-servers
# 2. 创建 TURN Key,拿到 Key ID 和 API Token
# 3. 在 relay-server 的 .env 设置:
CF_TURN_KEY_ID=xxxxxxxxxxxxxxxxxxxx
CF_TURN_API_TOKEN=yyyyyyyyyyyyyyyyyyyyy

relay-server 的 HandleICEConfig 会优先调 CF API 换 iceServers,9 分钟缓存。 计费:$0.05/GB outbound,1 小时音频通话 ≈ $0.003(不到 2 分人民币)。 全球 330+ anycast 节点,中国用户路由到香港/新加坡节点,延迟 50-150ms。

其他支持的后端(保留 TURN_HOST 兼容)

若 relay-server 的 .env 没配 CF_TURN_KEY_ID,自动降级到:

  1. TURN_HOST + TURN_SECRET 环境变量(自建 coturn HMAC-SHA1 临时凭证)
  2. 公共 STUN(仅 P2P 直连,开发/测试模式)

自建 coturn 的坑(参考,不推荐生产用):

  • denied-peer-ip=172.16.0.0-172.31.255.255 会误伤 AWS VPC 默认段 导致 CREATE_PERMISSION 403,allowed-peer-ip 白名单救不了
  • EC2 网卡 IP 是内网(172.31.x.x),不能 relay-ip=18.142.189.254 (EADDRNOTAVAIL),必须只用 --external-ip 广告公网 IP
  • 同一 WiFi 双端呼叫时,coturn 会判定"peer IP = 自己 IP"自循环拒绝, 需要 allow-loopback-peers + cli-password

这些坑全部可以用 CF TURN 一句话绕开。

E2EE for Calls

SDK 默认用 Insertable Streams(WebRTC Encoded Transform)加密音视频帧:

ts
// Applied automatically by CallModule — no manual setup needed
// Each RTP frame is encrypted with AES-256-GCM before leaving the device

即使 TURN server 泄露或被劫持,观察者只能看到加密后的 RTP,无法还原音视频。

Signaling Signature (crypto_v=2, 2026-04 hardening)

All call_* frames are Ed25519-signed and AES-GCM-encrypted. Plaintext signaling (historical crypto_v=1) is no longer accepted.

Automatic pipeline on send:

  1. Attach _ts (current time) + _nonce (16 random bytes)
  2. Ed25519 sign the full payload using sender's identity private key
  3. AES-GCM encrypt the signed blob with the ECDH session key
  4. Outer envelope carries only route fields (type, to, from, call_id, crypto_v:2)

Automatic pipeline on receive:

  1. Decrypt with the session key (must exist; no plaintext fallback)
  2. Verify Ed25519 signature using peer's identity public key
  3. Check |now - _ts| < 60s (replay window)
  4. Check _nonce not seen in last 5 minutes (replay cache)
  5. Check inner.from === envelope.from, inner.call_id === envelope.call_id

Any failure → frame silently dropped. Defends against MITM SDP injection, signaling replay, and envelope tampering by a compromised relay.

Important Notes

  • Call signaling 走 WebSocket,与 IM 消息共用通道,不另开端口
  • alwaysRelay 默认 false(1.0.11+ 行为变更);如需隐私模式强制 TURN,构造时显式传 true
  • onLocalStream / onRemoteStream 回调式 API 存在 React ref 时序竞态, 推荐用 observeLocalStream / observeRemoteStream 订阅式 API
  • 防连点:1.0.10+ SDK answer() 内部 _answering 锁,UI 层按钮也应 disable
  • 跨平台 ICE candidate(1.0.19+):Android SDK 发 call_ice 时使用 candidate=string + sdp_mid + sdp_mline 三字段,PWA SDK 1.0.19+ 自动归一化两种格式;Android↔PWA 通话需双方 SDK 均 ≥ 1.0.19
  • canonical 信令签名(1.0.20+):signSignal / verifySignal 改用递归排序的 canonical JSON(原 JSON.stringify(_, keys.sort()) 只排顶层 key,带嵌套对象的信令如 call_ice 会跨平台 verify 失败);Android↔PWA 通话必须双方 SDK 均 ≥ 1.0.20

Android 端原生接入注意事项(WebRTC for Android)

Android SDK 暂未封装 CallModule 类,App 层需要自己用 org.webrtc / stream-webrtc-android 实现 PeerConnection 管理。以下两个坑是已知陷阱, AI Vibe Coding 时务必照做,否则呼通了也用不了:

1. 必须配置 AudioManager 路由(否则接通后无声音)

WebRTC 不会自动切换系统音频路由。Android 默认 AudioManager.mode = MODE_NORMAL, 远端音轨会被路由到错误通路 → 表现为"通话已建立但完全没声音"

xml
<!-- AndroidManifest.xml,与 RECORD_AUDIO 同等必需 -->
<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />
kotlin
// 在 PeerConnection.PeerConnectionState.CONNECTED 时切到 IN_COMMUNICATION 模式
override fun onConnectionChange(state: PeerConnection.PeerConnectionState?) {
    if (state == PeerConnection.PeerConnectionState.CONNECTED) {
        val am = context.getSystemService(Context.AUDIO_SERVICE) as AudioManager
        am.mode = AudioManager.MODE_IN_COMMUNICATION
        am.isMicrophoneMute = false
        am.isSpeakerphoneOn = true   // 默认外放;听筒模式置 false
    }
}

// 通话挂断 / teardown 时记得恢复 MODE_NORMAL,否则会影响系统铃声、媒体音量
fun teardownPeer() {
    val am = context.getSystemService(Context.AUDIO_SERVICE) as AudioManager
    am.mode = AudioManager.MODE_NORMAL
    am.isSpeakerphoneOn = false
    // ...
}

2. 收到 call_offer 时必须缓存 SDP 并切换 INCOMING 状态

PWA SDK 1.0.20+ 起,发起方直接发 call_offer(不再单独发 call_invite)。 Android 被叫端如果只是缓存 SDP 而不更新 UI 状态(比如 _state.value = INCOMING), 会出现"PWA 已拨出,Android 完全无反应、不振铃"

kotlin
"call_offer" -> {
    val sdpStr = frame["sdp"] as? String ?: return
    if (peerConnection != null) {
        peerConnection.setRemoteDescription(...)
    } else {
        // 接听前 PC 还没创建,先缓存 SDP
        pendingOfferSdp = sdpStr
        // 关键:必须切到 INCOMING 状态,UI 层才会渲染响铃界面
        if (state == State.IDLE) {
            val isVideo = sdpStr.contains("\nm=video ") || sdpStr.startsWith("m=video ")
            _info.value = CallInfo(callId, from, isCaller = false,
                                   mode = if (isVideo) Mode.VIDEO else Mode.AUDIO)
            _state.value = State.INCOMING
        }
    }
}

参考实现:template-app-android/.../call/CallManager.kt(SDK 1.0.20 兼容版)。

Frame E2EE 细节(2026-04 P3.9)

  • 每个方向维护独立 counter,IV = baseIV ⊕ counter_le_8B
  • 帧格式:counter(8B big-endian) || AES-GCM(ciphertext||tag)
  • 接收端从帧头读 counter 派生 IV,不信任发送端携带的显式 IV
  • 同 counter 在 2048 帧窗口内重复 → 丢帧(防重放)
  • 单 key 累计加密达到 2^24 帧或 16 GiB 字节的 80% 时,Worker postMessage({type:'rekey-needed'}) 通知通话层重新协商密钥;达到 100% 阈值即拒绝加密后续帧,防止 AES-GCM 同 (key, IV) 复用

视频通话排查手册(2026-04-27 新增)

排查前必须先确认的 4 件事(否则 99% 走弯路)

报「视频通话不工作」前,必须先排除主观误判:

  1. 截图 / 录屏给我看 — 文字描述「看不到对方画面」太主观,可能是:
    • 真的黑屏(SurfaceView 没贴上)
    • 摄像头对着深色物体(画面真在显示,但内容很暗)
    • 摄像头物理上对着错误的东西(比如对着电脑屏幕上某张图,你以为没在工作)
  2. WebRTC stats 给我看 — F12 打开 chrome://webrtc-internals/,看 outbound-rtp (kind=video)inbound-rtp (kind=video):
    • bytesSent / framesEncoded 持续涨 → 你这边在发
    • bytesReceived / framesDecoded 持续涨 → 对端发的你收到了
    • 数字不涨 → 才是真问题
  3. SDK 1.0.27+ 自带 Diag dump — 通话连上后 console 每 2s 一行 📊 [Diag],把 5~10 条贴出来比文字描述精准 100 倍
  4. Android logcat — 关键看 org.webrtc.Logging tag:
    • SurfaceEglRenderer: Reporting first rendered frame. = 已渲染第一帧
    • EglRenderer: Frames received: N. Dropped: 0. Rendered: N. Render fps: X = 持续渲染中
    • 这两条出现 = Android 端已经在显示对方视频,问题不在 Android

任意一项数据给出「正在发 / 正在收 / 正在渲染」时,就不要再改 SDK,问题在物理摄像头位置或 UI 布局。

视频通话上线前自检清单

项目检查方法预期结果
主叫 SDP m=video 是 a=sendrecvconsole 看 🔴 STEP 6.5 行的 m-lines必须含 a=sendrecv(SDK 1.0.26+ 自动)
被叫 SDP m=video 是 a=sendrecvconsole 看 🟢 ANSWER STEP 3.5 行的 m-lines必须含 a=sendrecv
getUserMedia(video) 失败时报错拒绝摄像头权限,UI 应弹 alertUI 弹错(SDK 1.0.25+ 不再静默降级)
主叫 answer 超时有兜底拔网测,卡 15s 后 UI 应自动结束onError 触发"对方未应答"(SDK 1.0.24+)
ICE 失败有 restart 自愈NAT 切换 / TURN 单边超时restartIce 一次,connectionState 重新 connecting(SDK 1.0.24+)
双端 outbound + inbound 都有数据1.0.27 Diag dump 或 webrtc-internalsbytes/frames 双向持续涨

已知不是 SDK bug 的「现象」

下列情况换 SDK 版本无效,不要改 SDK:

  • PWA 摄像头物理上对着深色 / 黑色 / OBS 软件等 — 你看到的"黑屏"是真实视频帧,不是渲染失败
  • Android 看到的是 PWA 实际拍到的画面,但你以为应该看到别的 — 远端画面 = 远端摄像头当前指向的物体,不是 UI 截图
  • 本地预览小窗 = 自己的脸是正常的 — 这是被叫端预览自己摄像头的画面,不是远端
  • 小窗里是自己,大屏黑/暗 = 大屏其实在显示对方,只是对方摄像头视野很暗

真 bug 的客观信号

只有满足以下任一条件,才是 SDK / 信令 / 渲染层的真 bug:

  • 📛 outbound-rtp video bytesSent 永远 = 0(sender 死了)
  • 📛 inbound-rtp video bytesReceived 永远 = 0(收不到 RTP)
  • 📛 Android logcat 从未出现 SurfaceEglRenderer: Reporting first rendered frame(渲染未启动)
  • 📛 PeerConnection state 从未 connected(ICE 失败)
  • 📛 PWA 没弹摄像头授权 + 没弹错误对话框(SDK 1.0.25+ 必然弹其中之一)

Troubleshooting

1.0.3~1.0.12 加了以下诊断日志(console.error 级,生产 bundle 保留):

前缀含义出现位置
🔥 [App]应用层按钮点击、onClick 事件ChatWindow / CallScreen
🟠 [App]mod.call() 调用边界按钮 onClick
🔴 [CallModule] STEP 1~8发起方 call() 进度SDK
🟢 [CallModule] RCVD STEP 1~8接收方 handleOffer() 进度SDK
🟢 [CallModule] ANSWER STEP 0~6接收方 answer() 进度SDK
🟡 [CallModule]iceServers config、iceGatheringStateSDK
🔵 [CallModule]handleSignal call_id 流转SDK

用法:通话失败时,按发起端 + 接收端分别抓 console 完整输出, 定位失败 STEP 直接映射到 SDK 代码行。onicecandidateerror 的 errorText 会给 STUN/TURN 层级失败原因。


Call Lifecycle & Media Quality (1.0.29+)

Two SDK-level guarantees added in 1.0.29, plus 1.0.30 hardens timeout cancellation, for consistent cross-platform call behavior:

1.0.30 addition: timeout cancellation notifies the peer

When the caller's answer timeout (15s) fires, the SDK now actively sends call_hangup to the peer before cleanup, rather than only cleaning up locally. This fixes the "ghost call" issue — previously, when PWA silently aborted on timeout, the Android user could still tap "Accept" later and walk through the full answer flow; the SDK would negotiate for 19s before ICE FAILED kicked in, and the user perceived "in a call timing up" when it was actually 19s of "Establishing secure channel...".

When the callee receives this call_hangup while still in ringing state, the incoming-call UI dismisses immediately, eliminating ghost-call windows.

Lifecycle: automatic call termination

Beyond the explicit call_hangup / call_reject signaling, the SDK now self-terminates calls in two scenarios where the signal might never arrive (peer App killed by OS, network down, etc):

TriggerTimeoutAction
RTCPeerConnection.connectionState === 'disconnected' persists8 secondscleanup('ended') — fires state_change'ended'
Inbound RTP bytesReceived / framesDecoded no growth15 secondscleanup('ended') — same as above

These timeouts must match the Android SDK (8000ms / 15000ms) to avoid timeout drift across platforms.

The SDK also clears the disconnected timer on 'connected' / 'connecting' so it doesn't fight ICE restart.

Media quality: 720p30 + 1.5 Mbps default

Video calls now default to:

PropertyValue
Capture resolution1280×720 (ideal, browser may downgrade)
Capture framerate30 fps (ideal)
Max send bitrate1,500,000 bps (set via RTCRtpSender.setParameters after connect)
Degradation pref.'maintain-framerate' (drop resolution before framerate)
Codec preference(not in this release) — SDK does not call setCodecPreferences; codec is decided by browser/Android default negotiation. Planned for a separate future release.

What you need to do

Nothing. The SDK handles all of the above internally. Verify by:

  1. Make a video call between PWA ↔ Android.
  2. After ~10 seconds open browser console, look for 📊 [Diag #N] lines.
  3. Expected: frame=1280x720, fps>=24, qLimit=none (or bandwidth on slow networks).

If you see qLimit=cpu it means the device is encoder-bottlenecked — the SDK's degradationPreference: 'maintain-framerate' will already be downgrading resolution to keep 30 fps, but worst-case clients may fall to ~480p.

Why these constants are not configurable

If a user app sets maxBitrate to 3 Mbps but the peer SDK still has 1.5 Mbps, WebRTC's bandwidth estimator picks the lower value — meaning your config is silently ignored. Same for the disconnected timeout: if PWA is 8s and Android is 30s, Bug 1 (state desync) reappears in one direction.

The constants are intentionally internal so apps can't drift them. Tune via SDK PR if needed.

Zero-Knowledge E2EE Protocol — Decentralized Communication