Video Conferencing Systems
Video conferencing systems transmit real-time audio and video streams between endpoints, enabling face-to-face communication without physical presence. These systems comprise endpoint devices that capture and display media, network infrastructure that transports encoded streams, and signalling services that establish and manage sessions. For organisations operating across multiple locations with staff in headquarters, regional hubs, field offices, and remote positions, video conferencing infrastructure determines whether visual communication functions reliably or degrades into frustration.
- Endpoint
- A device that captures, encodes, transmits, receives, decodes, and displays video and audio. Endpoints range from software clients on laptops to dedicated room systems with specialised hardware.
- Codec
- The algorithm that compresses raw video and audio into transmittable streams and decompresses received streams for display. Common video codecs include H.264, H.265/HEVC, VP8, VP9, and AV1.
- MCU (Multipoint Control Unit)
- A server that receives streams from multiple endpoints, decodes them, composites them into layouts, re-encodes, and distributes to all participants. MCUs enable group conferences but consume substantial processing resources.
- SFU (Selective Forwarding Unit)
- A server that receives streams from endpoints and forwards them selectively without transcoding. Each endpoint receives streams directly and handles decoding locally, reducing server load but increasing endpoint requirements.
- Simulcast
- A technique where endpoints encode and transmit the same video at multiple quality levels simultaneously. The SFU selects which quality level to forward to each recipient based on their available bandwidth.
Signal and media architecture
Video conferencing separates two distinct communication channels: signalling handles session establishment, participant management, and call control; media carries the actual audio and video streams. This separation allows different transport mechanisms optimised for each purpose.
Signalling messages are small, infrequent, and must arrive reliably. A missed “participant joined” notification breaks the user experience, so signalling uses reliable transport (TCP or WebSocket) and can tolerate latency of several hundred milliseconds without perceptible impact. The Session Initiation Protocol (SIP) dominates enterprise video conferencing signalling. SIP messages describe session parameters, negotiate codecs, and exchange network addresses for media streams. Web-based systems increasingly use WebSocket connections carrying JSON or proprietary message formats, though many interoperate with SIP for standards-based room systems.
Media streams are large, continuous, and latency-sensitive. A video stream at 1080p30 generates 2-6 Mbps of encoded data depending on codec and scene complexity. This data must arrive within approximately 150 milliseconds end-to-end for natural conversation; delays beyond 300 milliseconds make real-time interaction difficult. Media therefore uses UDP transport, accepting occasional packet loss in exchange for lower latency. The Real-time Transport Protocol (RTP) structures media packets with sequence numbers, timestamps, and synchronisation information. RTCP (RTP Control Protocol) runs alongside RTP, carrying statistics about packet loss, jitter, and round-trip time that enable adaptive quality adjustment.
+------------------------------------------------------------------+| SIGNALLING FLOW |+------------------------------------------------------------------+
Endpoint A Signalling Server Endpoint B | | | |-----(1) INVITE------------>| | | (SDP offer) | | | |-----(2) INVITE------------>| | | (SDP offer) | | | | | |<----(3) 200 OK-------------| | | (SDP answer) | |<----(4) 200 OK-------------| | | (SDP answer) | | | | | |-----(5) ACK--------------->|-----(6) ACK--------------->| | | |+------------------------------------------------------------------+| MEDIA FLOW |+------------------------------------------------------------------+ | | |<====================(7) RTP/RTCP=======================>| | (direct peer-to-peer or via media server) | | |Figure 1: Separation of signalling (SIP/WebSocket over TCP) and media (RTP over UDP)
The Session Description Protocol (SDP) carried within SIP messages describes media capabilities: supported codecs, payload formats, encryption parameters, and candidate network addresses. Endpoints exchange SDP offers and answers to negotiate mutually supported configurations. When Endpoint A offers H.264 and VP9 video codecs, and Endpoint B supports only H.264, the negotiated session uses H.264. This negotiation happens during call setup and can be renegotiated mid-call if conditions change.
Endpoint categories
Video endpoints span a spectrum from software applications to dedicated hardware systems, with each category suited to different use cases and environments.
Software endpoints run as applications on general-purpose computers, tablets, and smartphones. The device’s built-in camera, microphone, and speakers serve as capture and playback hardware, while the CPU handles encoding and decoding. Software endpoints cost nothing beyond the device itself, deploy instantly via download, and update automatically. Their quality depends entirely on the host device: a laptop with a mediocre webcam in a noisy environment produces mediocre video and audio regardless of the conferencing platform. Software endpoints work well for individual participation from desks or home offices where each person has their own device.
USB peripherals enhance software endpoints with better capture hardware. External webcams with larger sensors and better optics produce sharper video than laptop cameras. USB speakerphones with beamforming microphone arrays and echo cancellation improve audio dramatically. These peripherals connect to the same software endpoint but address its weakest components. A USB speakerphone costing £100-300 transforms a laptop into a viable small-meeting endpoint.
Room systems are dedicated hardware designed for shared meeting spaces. A room system integrates display, camera, microphones, speakers, and compute into a purpose-built package. The camera mounts to capture the entire room; ceiling microphones or tabletop arrays cover all seating positions; speakers project to all occupants. Dedicated hardware encodes and decodes without competing for resources with other applications. Room systems provide consistent quality regardless of which laptop a participant brings. They range from compact units for 4-person huddle spaces to sophisticated systems for 20-person boardrooms with multiple displays and PTZ (pan-tilt-zoom) cameras.
Immersive systems represent the high end of room video, creating life-size representations of remote participants. Multiple cameras capture different angles; multiple displays arranged to match create the illusion that remote participants sit across the table. Precise audio placement reinforces spatial presence. These systems require purpose-built rooms with controlled lighting, acoustics, and furniture placement. Installation costs exceed £50,000 for a single room, making them viable only for high-value executive or board use.
+-------------------------------------------------------------------+| ROOM SYSTEM COMPONENTS |+-------------------------------------------------------------------+| || +-------------------+ +-------------------+ || | DISPLAY(S) | | CAMERA(S) | || | | | | || | - Primary display | | - Wide-angle | || | - Content display | | - PTZ tracking | || | - Touch panel | | - Multiple angles | || +--------+----------+ +--------+----------+ || | | || | +-------------------+ || | | || +--------v----------v----------+ || | COMPUTE/CODEC UNIT | || | | || | - Video encode/decode | || | - Audio processing | || | - Network interface | || | - Room control | || +--------+----------+----------+ || | | || +--------v----+ +--v-----------------+ || | AUDIO | | MICROPHONES | || | | | | || | - Speakers | | - Ceiling array | || | - Amp | | - Table pods | || +-------------+ +--------------------+ || |+-------------------------------------------------------------------+Figure 2: Room system component architecture showing integration between display, capture, compute, and audio
The choice between endpoint categories follows the meeting context. Individual desk workers use software endpoints on their own devices. Small meeting rooms (2-6 people) benefit from USB peripherals enhancing a laptop or from compact room systems. Medium meeting rooms (6-12 people) require room systems with proper microphone coverage and camera angles. Large meeting rooms (12+ people) need sophisticated room systems with multiple cameras and microphone zones. Purpose-built collaboration spaces for frequent high-stakes meetings justify immersive systems.
Deployment architecture
Video conferencing platforms operate in three deployment models: cloud-hosted services, on-premises infrastructure, and hybrid configurations combining both. The model determines where media processing occurs, where recordings reside, and what control the organisation retains.
Cloud-hosted platforms run signalling servers, media servers, and all supporting infrastructure in provider data centres. The organisation purchases subscriptions and connects endpoints via the internet. Cloud platforms eliminate infrastructure management entirely; the provider handles scaling, updates, redundancy, and geographic distribution. Providers operate media servers across dozens of regions worldwide, routing calls through servers near participants to minimise latency. The organisation trades infrastructure control for operational simplicity.
Cloud model constraints centre on data handling. Media streams transit provider infrastructure even for calls between two endpoints in the same office. Recordings stored by the provider reside in jurisdictions the organisation may not control. Organisations subject to data sovereignty requirements must verify that providers offer regional data residency and understand precisely which data types remain in-region versus replicated globally. Provider terms typically grant broad rights to use aggregated or anonymised meeting metadata. For organisations handling sensitive discussions, cloud platform selection requires careful review of data processing agreements.
On-premises deployment places all video infrastructure within organisation-controlled facilities. Signalling servers, media servers, and recording storage run on organisation hardware or private cloud resources. Call media never transits third-party infrastructure; recordings remain entirely within organisational control. This model provides maximum data sovereignty and enables operation without internet connectivity.
On-premises infrastructure demands substantial operational capacity. The organisation must provision servers, configure networking, manage certificates, apply security patches, plan capacity, and maintain high availability. A robust on-premises deployment requires redundant servers, load balancing, database replication, and geographic distribution across data centres. Few organisations outside large enterprises maintain the expertise and budget for reliable on-premises video infrastructure. Organisations considering on-premises deployment must honestly assess whether their IT function can maintain complex real-time communication infrastructure alongside all other responsibilities.
Hybrid architectures position specific components on-premises while leveraging cloud infrastructure for others. A common pattern places on-premises call control and recording with cloud-based meeting services for external participants. Internal meetings between staff route through on-premises infrastructure; meetings including external parties route through cloud services. This preserves data sovereignty for internal communications while enabling collaboration with partners, donors, and communities who cannot access on-premises systems.
+-------------------------------------------------------------------+| CLOUD DEPLOYMENT |+-------------------------------------------------------------------+| || Organisation Cloud Provider || +----------------+ +--------------------+ || | | | | || | Endpoints +----[Internet]----->+ Signalling | || | | | Media servers | || | - Laptops | | Recording | || | - Room sys | | (multi-region) | || | - Mobile | | | || +----------------+ +--------------------+ || |+-------------------------------------------------------------------+| ON-PREMISES DEPLOYMENT |+-------------------------------------------------------------------+| || Organisation Network || +----------------------------------------------------------+ || | | || | +------------+ +-------------------+ | || | | | | | | || | | Endpoints +----->+ On-prem servers | | || | | | | | | || | +------------+ | - Signalling | | || | | - Media | | || | | - Recording | | || | +-------------------+ | || | | || +----------------------------------------------------------+ || |+-------------------------------------------------------------------+| HYBRID DEPLOYMENT |+-------------------------------------------------------------------+| || Organisation Cloud || +----------------+ +--------------------+ || | | | | || | On-prem core |<------>| Cloud edge | || | | | | || | - Internal | | - External guests | || | meetings | | - Federation | || | - Recording | | - Webinar | || +----------------+ +--------------------+ || |+-------------------------------------------------------------------+Figure 3: Deployment model comparison showing data flow and component location
Media processing models
When more than two participants join a video conference, the system must distribute streams between all parties. Two architectural approaches handle this multiparty scenario with fundamentally different resource trade-offs.
The Multipoint Control Unit receives encoded streams from all participants, decodes them, composites them into a unified layout (speaker view, gallery view, or similar), re-encodes the composite, and sends identical streams to all participants. Each participant receives one stream regardless of participant count. Endpoint requirements remain constant; a device that handles two-party calls handles hundred-party calls equally. The MCU bears all processing burden, requiring powerful servers that scale with participant count and video resolution.
MCU architectures suit environments with constrained endpoints. When participants join from feature phones, older devices, or bandwidth-limited connections, the MCU handles complexity centrally. The trade-off is infrastructure cost: MCU processing for a 25-participant meeting at 720p demands substantial CPU resources. Cloud providers absorb this cost within subscription pricing; on-premises deployments must provision accordingly.
The Selective Forwarding Unit receives streams from all participants but does not decode or re-encode them. Instead, it examines stream metadata and forwards streams selectively to each recipient. If Participant A requests only the active speaker and Participant B requests a gallery of four, the SFU sends different stream sets to each. Endpoints decode multiple streams locally and composite their own display layouts.
SFU architectures shift processing burden to endpoints. The server merely routes packets, requiring far less CPU than transcoding. Bandwidth consumption increases because each endpoint receives multiple streams rather than one composite. A 25-participant gallery view requires the endpoint to decode 25 streams simultaneously. Modern laptops and smartphones handle this load for standard meeting sizes, but constrained devices struggle.
Simulcast enables SFU deployments to serve heterogeneous endpoints. Each sender encodes their video at three quality levels simultaneously: high (1080p), medium (720p), and low (360p). The SFU forwards the appropriate quality to each recipient based on their available bandwidth and requested layout. A participant viewing a 25-person gallery receives low-quality streams; a participant viewing one active speaker receives the high-quality stream for that speaker and low-quality for others. Simulcast triples sender encoding load and bandwidth but enables the SFU to adapt without transcoding.
Most contemporary platforms use SFU architecture with simulcast. The endpoint processing requirements align with modern devices, while simulcast enables adaptation to varying network conditions. MCU architectures persist for interoperability scenarios connecting to legacy standards-based room systems that expect single composite streams.
Bandwidth and quality
Video conferencing consumes bandwidth in direct proportion to resolution, frame rate, and scene complexity. Understanding these relationships enables capacity planning and quality troubleshooting.
A single video stream’s bitrate depends on the codec, resolution, frame rate, and motion content. H.264 encoding at 1080p30 (1920x1080 pixels, 30 frames per second) with moderate motion produces 2-4 Mbps. The same resolution at 720p30 produces 1-2 Mbps. Lower resolutions used for gallery views (360p, 180p) consume 200-500 Kbps each. Audio streams add 50-100 Kbps per participant for high-quality voice codecs like Opus.
A participant in a 9-person gallery view using an SFU platform receives: one high-quality stream for themselves (negligible, local only), potentially one high-quality stream for the active speaker (2 Mbps), and 7 lower-quality streams for other participants (350 Kbps each). Total receive bandwidth approaches 4.5 Mbps. The participant simultaneously uploads their own stream at 2-4 Mbps. Total bandwidth per participant in this scenario reaches 6-8 Mbps symmetric.
+-------------------------------------------------------------------+| BANDWIDTH ESTIMATION EXAMPLE |+-------------------------------------------------------------------+| || Scenario: 12-participant meeting, SFU architecture, simulcast || || Per-participant UPLOAD: || - High quality stream (1080p30): 2.5 Mbps || - Medium quality stream (720p30): 1.0 Mbps || - Low quality stream (360p30): 0.3 Mbps || - Audio stream: 0.1 Mbps || ---------- || Total upload (simulcast): 3.9 Mbps || || Per-participant DOWNLOAD (gallery view): || - Active speaker (high): 2.5 Mbps || - Other 10 participants (low): 10 x 0.3 = 3.0 Mbps || - 11 audio streams: 11 x 0.1 = 1.1 Mbps || ---------- || Total download: 6.6 Mbps || || Per-participant TOTAL: ~10.5 Mbps symmetric || || Room system with 5 remote endpoints in speaker view: || Upload: 3.9 Mbps || Download (1 high + 4 low + audio): 4.9 Mbps || Total: ~9 Mbps symmetric || |+-------------------------------------------------------------------+Figure 4: Bandwidth calculation for typical meeting scenarios
Network conditions between endpoint and media server determine achievable quality. Latency above 150 milliseconds one-way creates noticeable conversation lag. Packet loss above 1% causes visible artefacts as the decoder interpolates missing data. Jitter (variation in packet arrival timing) above 30 milliseconds forces larger receive buffers, adding latency. Quality of Service (QoS) markings enable network equipment to prioritise video traffic, but QoS functions only within managed networks; internet paths ignore application markings.
Adaptive bitrate algorithms respond to detected network conditions. When the endpoint or media server detects rising packet loss or increasing round-trip time, it signals the sender to reduce bitrate. The sender switches to a lower simulcast layer or reduces encoding parameters. Quality degrades gracefully rather than failing completely. Recovery occurs when conditions improve, though algorithms typically ramp up conservatively to avoid oscillation.
Recording and compliance
Video conference recordings serve compliance requirements, training purposes, and asynchronous distribution. Recording architecture determines where captured media resides and who controls access.
Cloud-based recording captures streams at the media server, encodes them into playback formats, and stores them in provider infrastructure. The meeting host initiates recording; participants receive notification. Recordings appear in provider portals or integrate with cloud storage services. This model requires no organisation infrastructure but places recordings under provider terms. Retention periods, geographic storage location, and access controls follow provider policies, which may not align with organisational requirements.
On-premises or hybrid recording captures streams locally before they leave organisation control. An organisation-operated server joins the meeting as a silent participant, receives all streams, and records to local storage. This model maintains recording sovereignty but requires server infrastructure, storage capacity, and operational maintenance. Organisations must provision storage for retained recordings: a 1-hour meeting at 1080p generates approximately 2-4 GB; organisations holding 10 meetings daily accumulate 400-800 GB monthly.
Compliance recording extends beyond simple capture. Regulated sectors require tamper-evident storage, chain-of-custody documentation, and integration with archival systems. Financial services firms operating under MiFID II must retain communications for 5-7 years in unalterable form. Healthcare organisations must handle recordings containing patient information under applicable data protection regulations. These requirements typically demand specialised compliance recording platforms rather than native meeting platform features.
Transcription and analysis capabilities increasingly accompany recording. Automated speech recognition generates text transcripts; natural language processing extracts action items, decisions, and summaries. These features provide genuine value but introduce additional data processing considerations. Transcripts constitute personal data; AI analysis may involve data transfer to model providers. Organisations must evaluate these features against data protection requirements and disable them where appropriate.
Consent for recording varies by jurisdiction. Some require all-party consent; others permit single-party consent; some prohibit recording without specific notice. Meeting platforms provide notification mechanisms but cannot ensure legal compliance, which depends on participant locations and applicable laws. Organisations must establish recording policies that account for their operating jurisdictions.
Low-bandwidth deployment
Field locations with constrained connectivity present distinct challenges for video conferencing. Satellite links, congested mobile networks, and shared connections cannot support bandwidth requirements that function unremarkably on office broadband.
Satellite connectivity imposes latency that signalling protocols tolerate but conversation dynamics struggle with. Geostationary satellite links add 600+ milliseconds round-trip latency before any processing or encoding delay. Total end-to-end latency reaches 800-1000 milliseconds, creating noticeable delay that fragments natural conversation flow. LEO satellite constellations reduce latency to 50-100 milliseconds, enabling more natural interaction, but coverage and service availability vary by location.
Bandwidth constraints force quality trade-offs. A 512 Kbps connection cannot support 1080p video; even 720p requires more bandwidth than available after accounting for other traffic. Practical deployment on constrained connections requires aggressive configuration: 360p or 480p maximum resolution, 15 frames per second rather than 30, audio-only as default with video enabled selectively. Endpoints must be configured to respect these limits rather than attempting auto-negotiation that exceeds available capacity.
Connection sharing amplifies constraints. A field office sharing a 2 Mbps satellite link among 20 staff cannot support multiple simultaneous video calls. One video conference consuming 1.5 Mbps leaves insufficient bandwidth for other users’ email, web, and application traffic. Organisations must establish policies: video conferences scheduled during off-peak hours, audio-only as default with video for specific needs, or dedicated time slots when video has priority.
Local media servers reduce WAN bandwidth for internal meetings. When three staff in a field office join a meeting with headquarters, a local media server can receive one copy of the HQ streams and distribute locally, rather than each field endpoint receiving identical streams across the satellite link. This fan-out pattern reduces WAN consumption from 3x to 1x for inbound streams. Implementation requires on-premises infrastructure at field locations, which introduces operational complexity.
Pre-positioned recordings and asynchronous video address some use cases without real-time constraints. Recorded briefings distribute overnight when bandwidth is available or unused. Staff record video messages for later transmission rather than attempting real-time calls. These approaches cannot replace interactive meetings but reduce pressure on constrained real-time capacity.
Security considerations
Video conferencing creates security exposure through multiple channels: meeting access, media content, signalling, recordings, and endpoint devices. Each requires specific controls.
Meeting access controls prevent unauthorised participation. Meeting passwords require participants to enter a code before joining. Waiting rooms hold participants until the host admits them, enabling verification before granting access. Meeting locks prevent additional participants after the meeting begins. Authenticated join requires participants to authenticate with organisational credentials rather than anonymous entry. Effective meetings for sensitive discussions enable all these controls; convenience-focused defaults often disable them.
Media encryption protects content during transmission. Transport encryption (TLS/DTLS) secures streams between endpoints and media servers; the server decrypts and re-encrypts, maintaining access to cleartext. End-to-end encryption encrypts at the sending endpoint with keys unavailable to the media server; only receiving endpoints can decrypt. True end-to-end encryption prevents provider access to meeting content but limits features requiring server-side processing: transcription, recording, and certain MCU functions. Not all platforms claiming encryption provide end-to-end protection; transport encryption is common, end-to-end less so.
Endpoint security extends beyond the conferencing application. Camera and microphone access persists beyond meeting duration if the application remains running. Malware targeting conferencing endpoints can capture meetings without visible indication. Endpoints must receive security updates promptly; dedicated room systems require attention equal to any networked device. Physical security for room systems prevents tampering: cameras and microphones in sensitive meeting rooms present obvious surveillance targets.
Recording security requires access controls on stored files, encryption at rest, and secure deletion when retention periods expire. Recordings shared via links must use expiring, authenticated URLs rather than permanent public links. Local recordings on endpoint devices require endpoint encryption and backup.
Platform options
Video conferencing platforms span open source self-hosted solutions, commercial cloud services, and commercial on-premises products. Selection depends on organisational requirements for features, data control, operational capacity, and budget.
Open source platforms provide full control over infrastructure and data at the cost of operational responsibility. Jitsi Meet offers WebRTC-based video conferencing deployable on organisation infrastructure with no licensing cost. BigBlueButton targets educational use cases with features like shared whiteboards, breakout rooms, and recording. These platforms require Linux administration skills for deployment and maintenance, scaling expertise for larger deployments, and security vigilance for public-facing services. Organisations with capable IT teams and data sovereignty requirements find open source platforms compelling; organisations without infrastructure capacity should avoid the operational burden.
Commercial cloud platforms provide turnkey services with predictable subscription costs. Provider options span general-purpose platforms and those with nonprofit programmes offering discounted or donated access. Cloud platforms eliminate infrastructure operations but require acceptance of provider data handling, jurisdictional considerations for data storage, and feature sets determined by the provider roadmap. Selection criteria include: nonprofit pricing availability, data residency options, end-to-end encryption availability, integration with existing collaboration tools, and support for required endpoint types.
Commercial on-premises platforms offer vendor-supported software running on organisation infrastructure. These suit organisations requiring data sovereignty without capacity to operate open source platforms. Licensing costs exceed cloud subscriptions, and operational responsibility remains substantial, but vendor support provides a safety net absent from self-supported open source. This middle path makes sense for larger organisations with compliance requirements and existing data centre infrastructure.
| Deployment Model | Infrastructure Responsibility | Data Sovereignty | Operational Capacity Required | Relative Cost |
|---|---|---|---|---|
| Open source self-hosted | Full | Complete | High | Low (infrastructure only) |
| Commercial cloud | None | Provider-dependent | Low | Medium (subscription) |
| Commercial on-premises | Substantial | Complete | Medium-High | High (licence + infrastructure) |
Integration patterns
Video conferencing rarely operates in isolation. Integration with identity systems, calendaring, collaboration platforms, and room booking creates coherent user experiences.
Identity integration enables single sign-on and consistent user profiles. Platforms supporting SAML or OIDC authentication connect to organisational identity providers, eliminating separate video conferencing credentials. Users authenticate once and access meetings without additional login. Group memberships from the identity provider can govern meeting creation permissions, recording rights, and administrative access.
Calendar integration enables one-click meeting joins from calendar events. Meeting creation populates calendar invitations with join links; calendar applications display join buttons when meetings approach. This integration requires connection between the video platform and the organisation’s calendar system, whether through direct API integration or connector services.
Collaboration platform integration embeds video conferencing within daily workflows. Meetings launch from team messaging channels with participant lists derived from channel membership. Meeting recordings and transcripts post automatically to relevant channels. Chat during meetings integrates with persistent channel history. These integrations depend on platform combinations; some integrate natively, others require third-party connectors, some cannot integrate meaningfully.
Room booking integration connects physical meeting room scheduling with video endpoint configuration. When staff book a room with video capabilities, the system automatically schedules the room system to join the meeting. No manual configuration required at meeting time; staff enter the room and the meeting displays on screen. This integration requires room booking systems that communicate with video platforms, whether through native integration or custom automation.
Implementation considerations
Organisations approaching video conferencing infrastructure should assess requirements against capacity before selecting platforms and deployment models.
For organisations with limited IT capacity, cloud platforms with nonprofit programmes provide reliable video conferencing without infrastructure investment. A single IT person cannot maintain on-premises video infrastructure alongside other responsibilities. Cloud platforms handle scaling, updates, and availability. Selection should prioritise platforms offering nonprofit discounts, acceptable data residency, and integration with existing collaboration tools. Avoid customisation that creates support burden; accept platform defaults where possible.
For organisations with established IT functions, hybrid models provide data sovereignty for sensitive meetings while cloud services enable external collaboration. On-premises infrastructure handles internal meetings; cloud platforms handle meetings with external participants. This model requires sufficient IT capacity to maintain on-premises components and clear policies directing which meetings use which platform.
For organisations operating in high-risk contexts, end-to-end encrypted platforms protect meeting content from provider access and potential legal compulsion. Open source platforms deployed on organisation infrastructure eliminate provider involvement entirely but demand substantial technical capacity. Risk tolerance and operational capability must align; an insecure deployment of a theoretically secure platform provides false confidence.
For field deployments on constrained connectivity, configure endpoints for minimal bandwidth consumption: audio-only default, video resolution limits, frame rate restrictions. Establish policies for video use that prevent meeting traffic from consuming shared link capacity. Consider asynchronous video workflows where real-time communication is impractical. Test video conferencing over actual field connectivity before deployment; laboratory testing on office broadband does not predict field behaviour.