Skip to main content

Video Conferencing

Video conferencing platforms enable real-time visual communication between geographically distributed participants through synchronised audio and video streams. These systems form a core component of collaboration infrastructure for organisations with staff across multiple offices, field locations, and home-based arrangements. The technology spans simple one-to-one calls through large-scale webinars reaching thousands of participants, with significant variation in bandwidth requirements, equipment needs, and platform capabilities across these use cases.

Video conferencing
Synchronous communication combining audio and video streams between two or more participants, requiring dedicated client software or browser-based access.
Webinar
One-to-many video broadcast where presenters share content to an audience with limited interactive capability, supporting larger participant counts than standard meetings.
SFU (Selective Forwarding Unit)
Server architecture that receives video streams from each participant and selectively forwards them to others, reducing client-side processing compared to mesh topologies.
MCU (Multipoint Control Unit)
Server that receives all participant streams, decodes and composites them into a single mixed stream, then encodes and distributes the result to all participants.
Simulcast
Transmission technique where clients send multiple resolution versions of their video stream, allowing the server to forward appropriate quality based on each recipient’s bandwidth.

Platform Architecture

Video conferencing platforms operate using one of three fundamental architectures, each with distinct implications for scalability, bandwidth consumption, and infrastructure requirements.

Mesh architecture connects each participant directly to every other participant. In a four-person call, each device maintains three separate video connections, sending its stream three times and receiving three incoming streams. This approach requires no server infrastructure for media relay but scales poorly: a ten-person meeting demands each participant upload their video nine times simultaneously. Mesh works adequately for calls with three or four participants on broadband connections but becomes impractical beyond this threshold.

SFU architecture interposes a server between participants. Each client uploads a single video stream to the server, which then selectively forwards that stream to other participants. The server performs no transcoding or mixing, simply routing packets. This reduces upload bandwidth requirements dramatically compared to mesh while keeping server processing minimal. Most modern platforms including Jitsi Meet, BigBlueButton, and the underlying infrastructure of commercial services use SFU architecture. A ten-person SFU meeting requires each participant to upload one stream and download nine, compared to uploading nine streams in mesh topology.

MCU architecture places greater processing burden on the server. The MCU receives all participant streams, decodes them, composites them into a single mixed video layout, re-encodes the result, and distributes one stream to each participant. Participants receive only a single video stream regardless of meeting size, dramatically reducing download bandwidth. However, MCU servers require substantial processing capacity, introducing latency through the decode-composite-encode cycle. MCU deployment suits environments with severely constrained client bandwidth, such as satellite links, where the bandwidth savings outweigh the latency cost.

+------------------------------------------------------------------+
| VIDEO CONFERENCING ARCHITECTURES |
+------------------------------------------------------------------+
MESH (Peer-to-Peer) SFU (Selective Forwarding)
+-------+ +-------+
| A | | A |
+---+---+ +---+---+
| |
+-----------------+ v
| | +----+----+
| | | |
+---v---+ +---v---+ | SFU |
| B <---------> C | | Server |
+-------+ +-------+ | |
+----+----+
Each participant sends |
to all others directly +----+----+----+
| | | |
v v v v
A B C D
Server forwards streams
selectively to recipients
MCU (Multipoint Control) HYBRID (SFU + Simulcast)
+-------+ +-------+ +-------+
| A | | B | | A |
+---+---+ +---+---+ +---+---+
| | |
v v 720p|360p|180p
+---+----------+---+ v v v
| | +----+----+
| MCU | | |
| Server | | SFU |
| | | |
| [Decode-Mix- | +----+----+
| Encode] | |
+--------+---------+ +----+----+
| | |
+--------+--------+ 720p v 180p v
| | | +---+---+ +---+---+
v v v | B | | C |
A B C +-------+ +-------+
Broadband Mobile
All receive same
composited stream Server selects quality
per recipient capability

Modern platforms combine SFU architecture with simulcast transmission. Clients encode their video at multiple resolutions simultaneously, typically 720p, 360p, and 180p. The SFU server receives all resolution variants and forwards the appropriate quality level to each recipient based on their available bandwidth and display size. A participant viewing a small thumbnail receives only the 180p stream, while someone with the speaker in full-screen view receives 720p. This adaptive approach optimises bandwidth utilisation across heterogeneous network conditions without requiring MCU processing overhead.

Platform Selection

Platform selection requires balancing capability requirements against operational constraints including cost, data jurisdiction, self-hosting feasibility, and integration with existing infrastructure.

Open source platforms provide complete control over data location and eliminate per-user licensing costs. Jitsi Meet offers a fully functional video conferencing system deployable on organisation-controlled infrastructure. A basic Jitsi deployment handles 35 participants in a single meeting on a 4-core, 8GB server with adequate bandwidth. Scaling beyond this requires either vertical scaling to larger instances or deploying multiple video bridge servers behind a load balancer. Jitsi requires no account creation for participants, reducing friction for external meetings with partners or communities.

BigBlueButton targets educational and training contexts with integrated whiteboard, breakout rooms, shared notes, and recording capabilities. The platform requires more substantial server resources than Jitsi, with 8 cores and 16GB RAM supporting approximately 100 concurrent users across multiple sessions. BigBlueButton integrates with learning management systems through LTI (Learning Tools Interoperability) standards.

Commercial platforms offer reduced operational overhead at the cost of ongoing subscription fees and data residency constraints. Most commercial providers offer nonprofit pricing programmes with 30-70% discounts on standard rates. Microsoft Teams integrates with Microsoft 365 deployments, using existing identity infrastructure and licensing. Zoom provides robust performance on constrained networks through aggressive bandwidth adaptation and audio-priority fallback. Google Meet integrates with Google Workspace environments.

When evaluating commercial platforms, data jurisdiction requires specific attention. Video streams, recordings, chat transcripts, and meeting metadata flow through provider infrastructure. For organisations handling protection-sensitive communications or operating in contexts where surveillance is a concern, the location of this infrastructure and the legal frameworks governing access matter significantly. US-headquartered providers are subject to CLOUD Act provisions regardless of where data is physically stored.

The following comparison addresses core selection criteria:

CriterionJitsi MeetBigBlueButtonMicrosoft TeamsZoom
Self-hostingFull supportFull supportNot availableNot available
Maximum participants (standard)75100300300
Webinar capacityLimited100+1,000-10,000500-50,000
Breakout roomsYesYesYesYes
RecordingServer-sideServer-sideCloudCloud or local
E2EE supportYes (insertable streams)NoLimitedYes (paid tiers)
Nonprofit programmeN/A (free)N/A (free)YesYes
Browser-only participationYesYesLimitedYes
Minimum server (self-hosted)4 core, 8GB8 core, 16GBN/AN/A

Bandwidth Requirements

Video conferencing bandwidth requirements vary dramatically based on resolution, codec efficiency, and number of visible participants. Understanding these requirements enables appropriate platform configuration for different network contexts.

A single 720p video stream using modern codecs (VP9 or H.264) consumes 1.2-2.5 Mbps depending on motion complexity. Talking-head video with minimal movement compresses more efficiently than a presenter walking around a room. Audio adds 50-100 kbps using Opus codec. Screen sharing with text-heavy content requires 150-300 kbps for readable quality; screen sharing with video content demands 2-5 Mbps.

For a participant in a meeting with nine other visible video feeds at 360p each, download bandwidth consumption reaches approximately 4.5 Mbps. Upload requirements depend on whether the platform uses simulcast: with simulcast enabled, upload reaches 3-4 Mbps for the three resolution variants; without simulcast, upload is 1.5-2 Mbps for a single stream.

Field deployments with constrained connectivity require explicit bandwidth management. The following thresholds guide configuration decisions:

Available bandwidthRecommended configuration
Under 256 kbpsAudio only; disable video
256-512 kbpsSingle 180p video stream; 2-3 participants max
512 kbps - 1 Mbps360p video; 4-6 participants; disable HD
1-2 Mbps720p send; 360p receive for gallery view
2-5 Mbps720p send and receive; screen sharing functional
Over 5 MbpsFull quality; 1080p supported

Platform configuration for bandwidth-constrained environments involves both server-side and client-side settings. In Jitsi, the /etc/jitsi/meet/config.js file controls quality parameters:

// Bandwidth-constrained configuration
config.resolution = 360;
config.constraints.video.height.ideal = 360;
config.constraints.video.height.max = 360;
config.disableSimulcast = false;
config.enableLayerSuspension = true;
config.startVideoMuted = 10; // Mute video for participants beyond 10
config.channelLastN = 4; // Only receive video from 4 most recent speakers

The channelLastN parameter proves particularly effective for bandwidth conservation. Rather than receiving video streams from all participants, the client receives only the N most recently active speakers. In a 20-person meeting with channelLastN set to 4, download bandwidth drops from approximately 7.2 Mbps (20 streams at 360p) to 1.4 Mbps (4 streams), an 80% reduction.

+------------------------------------------------------------------+
| BANDWIDTH OPTIMISATION DECISION TREE |
+------------------------------------------------------------------+
+------------------------+
| Measure available |
| bandwidth |
+-----------+------------+
|
+-----------------+------------------+
| | |
< 512 kbps 512 kbps - 2 Mbps > 2 Mbps
| | |
v v v
+---------+-------+ +-------+--------+ +------+-------+
| Audio-only mode | | Constrained | | Standard |
| Disable video | | video mode | | operation |
+---------+-------+ +-------+--------+ +------+-------+
| | |
v v |
+---------+-------+ +-------+--------+ |
| Enable audio | | resolution=360 | |
| redundancy | | channelLastN=4 | |
| Opus at 24kbps | | simulcast=on | |
+-----------------+ +-------+--------+ |
| |
v |
+--------+--------+ |
| Monitor packet |<--------+
| loss and jitter |
+--------+--------+
|
+-----------------+------------------+
| |
Loss > 5% Loss < 5%
| |
v v
+---------+---------+ +-----------+---------+
| Reduce resolution | | Maintain current |
| further or switch | | configuration |
| to audio-only | +---------------------+
+-------------------+

Satellite connections present particular challenges beyond raw bandwidth limitations. Geostationary satellite links introduce 600-800ms round-trip latency, causing noticeable delays in conversational flow. Platform selection for satellite contexts should favour those with robust jitter buffers and graceful degradation. Pre-buffering settings can smooth playback at the cost of additional perceived delay. MCU architecture, despite its other limitations, reduces the impact of upstream packet loss since each participant sends only one stream rather than multiple simulcast variants.

Meeting Room Systems

Video conferencing extends beyond individual devices to dedicated meeting room installations. Room systems range from simple webcam-and-display combinations through purpose-built appliances to professional installations with multiple cameras, microphones, and displays.

A minimal room system consists of a wide-angle USB camera, a speakerphone or ceiling microphone array, and a display connected to a laptop or dedicated compute device running the conferencing client. This configuration suits rooms accommodating 4-8 people and costs £500-1,500 depending on equipment quality.

Mid-tier room systems use dedicated appliances that integrate camera, microphone, speaker, and compute into a managed unit. These devices connect to displays via HDMI and to the network via Ethernet, appearing as a single endpoint in the platform’s device management console. Examples include Poly Studio, Logitech Rally Bar, and Neat Bar. These systems cost £2,000-5,000 and suit conference rooms for 8-16 participants.

Professional installations in larger boardrooms deploy multiple cameras with speaker tracking, ceiling microphone arrays providing uniform coverage, and multiple displays showing remote participants and shared content separately. These installations require professional design and integration, with costs ranging from £15,000-50,000 depending on room size and requirements.

+---------------------------------------------------------------+
| MEETING ROOM CONFIGURATIONS |
+---------------------------------------------------------------+
HUDDLE ROOM (2-4 people) SMALL CONFERENCE (4-8 people)
+------------------+ +------------------------+
| | | |
| +---------+ | | +--------------+ |
| | Display | | | | Display | |
| +----+----+ | | +------+-------+ |
| | | | | |
| +----+----+ | | +------+-------+ |
| | Webcam | | | | USB Camera | |
| +---------+ | | | (wide-angle) | |
| | | +--------------+ |
| +---------+ | | |
| |Speakerph| | | +--------------+ |
| +---------+ | | | Speakerphone | |
| | | | (tabletop) | |
| Laptop connects | | +--------------+ |
| all components | | |
+------------------+ | Dedicated compute or |
| room appliance |
+------------------------+
MEDIUM CONFERENCE (8-16 people) BOARDROOM (16+ people)
+---------------------------+ +-----------------------------+
| | | |
| +------+ +------+ | | +------+ +------+ |
| |Disp 1| |Disp 2| | | |Disp 1| |Disp 2| |
| +------+ +------+ | | +------+ +------+ |
| | | |
| +------------+ | | +---+ +---+ +---+ |
| | PTZ Camera | | | |Cam| |Cam| |Cam| |
| | (tracking) | | | +---+ +---+ +---+ |
| +------------+ | | (Multi-camera tracking) |
| | | |
| +----+ +----+ +----+ | | +---------------------+ |
| |Mic | |Mic | |Mic | | | | Ceiling mic array | |
| +----+ +----+ +----+ | | +---------------------+ |
| (Tabletop array) | | |
| | | +-------------------+ |
| +------------------+ | | | Room controller | |
| | Video appliance | | | | (touch panel) | |
| +------------------+ | | +-------------------+ |
+---------------------------+ +-----------------------------+

Room system integration with calendar systems enables one-touch meeting join. When a calendar event includes a video conferencing link, the room system displays a “Join” button at meeting start time. This requires either native calendar integration in the room appliance or a room scheduling display that triggers the conferencing endpoint. Microsoft Teams Rooms and Zoom Rooms provide tight calendar integration with their respective platforms; platform-agnostic solutions like Poly use standards-based calendar connectors.

Audio quality in meeting rooms depends critically on microphone placement and acoustic treatment. A single tabletop speakerphone covers a 3-metre radius effectively. Beyond this, multiple microphones or ceiling arrays become necessary. Untreated rooms with hard parallel surfaces create echo and reverberation that degrade remote participants’ experience even when in-room participants perceive audio as acceptable. Acoustic panels on walls and ceiling reduce reverberation time; a target RT60 (time for sound to decay by 60dB) of 0.4-0.6 seconds suits video conferencing well.

Recording and Transcription

Recording video conferences serves purposes ranging from compliance documentation through training content to meeting notes for absent colleagues. Recording approaches divide into local recording, where a participant’s device captures the session, and cloud or server-side recording, where the conferencing infrastructure captures streams.

Local recording captures the video and audio as rendered on the recording participant’s device, including any bandwidth-induced quality reductions. The recording exists only on that participant’s device, providing data locality but requiring manual transfer or upload to share. Local recording works with any platform that allows screen capture.

Server-side recording in self-hosted deployments captures streams at full uploaded quality before any download-side bandwidth reduction. Jitsi recording uses Jibri (Jitsi Broadcasting Infrastructure), a separate service that joins meetings as a hidden participant and captures the composed view. A dedicated Jibri server with 4 cores and 8GB RAM handles one concurrent recording; multiple Jibri instances enable parallel recording of simultaneous meetings. Recordings write to local storage or cloud object storage.

Cloud recording in commercial platforms stores recordings in provider infrastructure. This simplifies operational management but creates data residency considerations. Recordings of sensitive discussions reside in provider-controlled storage, subject to provider security practices and applicable legal frameworks. Most platforms retain recordings for defined periods (30-120 days typically) before automatic deletion unless explicitly archived.

Transcription and captioning provide accessibility support and enable text-based search of meeting content. Live captioning displays spoken content as text during the meeting, supporting participants with hearing impairments and those joining from noisy environments. Post-meeting transcription generates searchable text records.

Transcription accuracy varies with audio quality, speaker accent diversity, and domain vocabulary. General-purpose transcription services achieve 85-95% accuracy on clear audio with standard accents; accuracy drops to 70-80% with heavy accents, multiple simultaneous speakers, or domain-specific terminology. Custom vocabulary training, where available, improves accuracy for organisation-specific terms.

On-premises transcription using models like Whisper provides data locality at the cost of compute infrastructure. A GPU-equipped server processes audio at approximately 10x real-time speed, meaning a one-hour recording transcribes in six minutes. CPU-only processing achieves 0.5-1x real-time. For organisations with data sovereignty requirements preventing use of cloud transcription services, local Whisper deployment offers a viable alternative.

Security and Privacy

Video conferencing security encompasses transport encryption, authentication, meeting access controls, and data handling practices. The sensitivity of communications determines appropriate security posture.

Transport encryption protects streams between participants and servers from interception. All modern platforms encrypt media streams using SRTP (Secure Real-time Transport Protocol) with AES encryption. This protects against network-level eavesdropping but does not protect against access by the platform provider, whose servers necessarily process unencrypted streams in SFU architecture.

End-to-end encryption (E2EE) extends protection to cover the path between participants, preventing even the platform operator from accessing content. E2EE implementation in video conferencing uses insertable streams, where clients encrypt video frames before transmission and decrypt after reception. The SFU server forwards encrypted packets without access to plaintext content. E2EE prevents server-side features including recording, transcription, and breakout rooms since the server cannot process encrypted content.

Jitsi implements E2EE using insertable streams in Chromium-based browsers. Participants see a shield icon indicating E2EE status. All participants must use compatible browsers; mobile apps may not support E2EE. Commercial platforms vary in E2EE availability: Zoom offers E2EE on paid plans with acknowledged feature limitations; Microsoft Teams provides E2EE for one-to-one calls but not group meetings at time of writing.

Meeting access controls prevent unauthorised participants from joining or disrupting meetings. A waiting room or lobby holds participants until the host admits them, enabling visual verification of identity before granting access. Meeting passwords add a knowledge factor, though passwords shared via email alongside meeting links provide limited additional security. Authenticated access requiring organisational login before join provides stronger identity verification for internal meetings.

For meetings with external participants from partner organisations or communities, authentication requirements create friction that may exclude intended participants. A balanced approach uses authenticated access for internal meetings and password-protected or open-access meetings for external collaboration, with waiting room moderation for sensitive discussions.

Recording consent and notification present legal requirements in many jurisdictions. Some regions require all-party consent for recording; others permit single-party consent. Platform configuration can enforce recording notification banners, but these do not prevent a determined participant from using external screen recording tools. Meeting policies should clarify recording practices and expectations regardless of technical controls.

Accessibility

Accessible video conferencing enables full participation by people with diverse abilities. Accessibility considerations span visual, auditory, motor, and cognitive dimensions.

Live captions provide text representation of spoken content for deaf and hard-of-hearing participants. Automatic captions have improved significantly but remain imperfect; for critical meetings, human captioners provide higher accuracy. Caption display should offer customisable font size, positioning, and colours to accommodate individual preferences.

Screen reader compatibility enables blind and low-vision users to navigate meeting controls, participant lists, and chat using assistive technology. Platform evaluation should include testing with common screen readers (NVDA, JAWS, VoiceOver) for keyboard navigation, control labelling, and focus management.

Keyboard navigation allows users who cannot use a mouse to access all meeting functions. Essential keyboard shortcuts include mute/unmute, camera toggle, screen share, and raise hand. Platforms should document keyboard shortcuts and ensure consistent behaviour.

Sign language interpretation in video meetings requires interpreter video to remain visible regardless of active speaker switching. Gallery view or pinning interpreter video prevents interpretation from disappearing during screen shares. Some platforms offer dedicated interpretation channels where the interpreter receives a separate video feed.

Cognitive accessibility benefits from clear, consistent interfaces without unnecessary complexity. Features supporting cognitive accessibility include visible meeting timers, clear indication of who is speaking, and chat message notifications that do not disrupt focus.

The Accessibility Standards reference provides specific WCAG criteria applicable to video conferencing platforms.

External Collaboration Patterns

Video conferencing with external participants, whether partner organisations, donors, government officials, or community members, requires different considerations than internal meetings.

For meetings with partner organisations, federation enables users to join across organisational boundaries while maintaining identity within their home system. Microsoft Teams and Zoom support federation between organisations with established trust relationships. Federation reduces friction compared to guest access but requires administrative configuration.

Guest access provisions a temporary or limited account for external participants. This adds friction but provides clearer audit trails and enables access controls. Guest links providing one-click access without authentication offer lowest friction but minimal identity verification.

Community engagement via video conferencing faces connectivity and technology access barriers. Participants may join from shared devices, low-bandwidth connections, or mobile phones with limited data plans. Accommodating these scenarios requires audio-only fallback (allowing dial-in via telephone), low-bandwidth platform modes, and tolerance for participants joining and dropping as connectivity fluctuates.

Data minimisation for external meetings limits what information about participants and meeting content persists. Disabling automatic recording, using ephemeral chat, and avoiding features that collect participant data beyond what the meeting requires align with privacy-by-design principles.

Implementation Considerations

For organisations with limited IT capacity

A single IT person or staff member managing technology alongside other duties should prioritise platforms minimising operational overhead. Cloud-hosted commercial platforms eliminate server management, patching, and scaling concerns. The nonprofit programmes from major providers make commercial platforms cost-effective for smaller organisations.

For organisations already using Microsoft 365 or Google Workspace, the integrated video conferencing (Teams or Meet) adds no additional cost or management burden. Integration with existing identity, calendar, and file storage reduces complexity.

If data jurisdiction concerns preclude commercial cloud platforms, Jitsi Meet offers straightforward self-hosting. A single server handles meetings for organisations under 100 staff. Installation using Docker or the official package repository completes in under an hour. The jitsi-meet-quick-install script automates basic configuration including SSL certificate provisioning via Let’s Encrypt.

For organisations with established IT functions

Larger IT teams can evaluate self-hosted deployments against commercial platforms based on organisational requirements rather than resource constraints alone. Self-hosted Jitsi or BigBlueButton provide data sovereignty, customisation flexibility, and elimination of per-user costs at the investment of infrastructure management.

Multi-server Jitsi deployments scale to thousands of concurrent participants across many simultaneous meetings. The architecture separates components: Ojiserver handles signalling; jitsi-videobridge instances handle media; multiple videobridges behind a load balancer enable horizontal scaling. Each videobridge server handles approximately 100-150 participants depending on specifications.

Room system deployment at scale benefits from centralised device management. Commercial platforms provide admin consoles showing device health, software versions, and usage analytics. Platform-agnostic room systems require separate management tooling.

For field deployments

Field offices with constrained bandwidth require explicit configuration for low-bandwidth operation. Apply platform settings limiting resolution and participant video counts as described in the bandwidth optimisation section. Provide staff guidance on when to disable video entirely, preserving bandwidth for audio and essential screen sharing.

Pre-scheduled meetings with field offices should account for connectivity variability. Build buffer time into agendas for technical setup. Establish fallback communication channels (phone bridge, messaging) if video connectivity fails. Test connectivity before critical meetings rather than discovering problems at meeting start.

Training staff on bandwidth-conscious practices includes muting video when not speaking in large meetings, using speaker view rather than gallery view to reduce incoming streams, and closing other bandwidth-consuming applications during video calls.

See also