NIST Speaker Recognition Evaluation

Summary

The goal of the NIST Speaker Recognition Evaluation (SRE) series is to contribute to the direction of research efforts and the calibration of technical capabilities of text independent speaker recognition. The overarching objective of the evaluations has always been to drive the technology forward, to measure the state-of-the-art, and to find the most promising algorithmic approaches. To this end, NIST has been coordinating Speaker Recognition Evaluations since 1996. Since then over 70 organizations have participated in our evaluations. Each year new researchers in industry and academia are encouraged to participate. Collaboration between universities and industries is also welcomed. Each evaluation begins with the announcement of the official evaluation plan, which clearly states the tasks, data, performance metric, and participation rules involved with the evaluation. The evaluation culminates with a follow-up workshop, where NIST reports the official results along with analyses of performance, and researchers share and discuss their findings with NIST and one another.

SRE24 Schedule

Jun

Evaluation Plan Published
Jul - Sep

Registration Period
Jul

Dev/Training data available
Aug

Evaluation period
Oct

System output and system descriptions due to NIST
Nov

Evaluation results release
Dec 3-4

Post-evaluation workshop

Contact Us

Please send questions to: [email protected]

For sre24 discussion please visit our Google Group.

NIST 2024 Speaker Recognition Evaluation

Summary

The 2024 Speaker Recognition Evaluation (SRE24) is the next in an ongoing series of speaker recognition evaluations conducted by the US National Institute of Standards and Technology (NIST) since 1996. The objectives of the evaluation series are (1) to effectively measure system-calibrated performance of the current state of technology, (2) to provide a common framework that enables the research community to explore promising new ideas in speaker recognition, and (3) to support the community in their development of advanced technology incorporating these ideas. The evaluations are intended to be of interest to all researchers working on the general problem of text-independent speaker recognition. To this end, the evaluations are designed to focus on core technology issues and to be simple and accessible to those wishing to participate.

SRE24 will be organized similar to SRE21, focusing on speaker detection over conversational telephone speech (CTS) and audio from video (AfV). It will again offer cross-source (i.e., CTS and AfV) and cross- lingual trials, thanks to a multimodal and multilingual (i.e., with multilingual subjects) corpus collected outside North America. However, it will also introduce two new features as compared to previous SREs, including enrollment segment duration variability and shorter duration test segments.

SRE24 will offer both fixed and open training conditions to allow uniform cross-system comparisons and to understand the effect of additional and unconstrained amounts of training data on system perfor- mance. Similar to SRE21, SRE24 will consist of three tracks: audio-only, visual-only, and audio-visual, which involves automatic person detection using audio, image, and video materials. System submission is required for the audio and audio-visual tracks, and optional for the visual track.

For more information about SRE24 please see the SRE24 Evaluation Plan or send questions to [email protected]

SRE 2024 Tentative Schedule

Milestone	Date
Evaluation plan published	Jun
Training data available	Jul
Scoring code release	Jul
Registration period	Jul - Sep
Development data available to participants	Jul
Evaluation Period Opens	Aug
Fixed condition submissions due to NIST	Oct
Open condition submissions due to NIST	Oct
System description due to NIST	Oct
Official results released	Nov
Workshop registration period	Nov
Post-evaluation workshop	Dec 3-4, 2024

LRE/SRE Paper & Data Citations

NIST LRE	Citation	Link
LRE17	Sadjadi, Seyed Omid, Timothee Kheyrkhah, Craig S. Greenberg, Elliot Singer, Douglas A. Reynolds, Lisa P. Mason, and Jaime Hernandez-Cordero. "Performance Analysis of the 2017 NIST Language Recognition Evaluation." In Interspeech, pp. 1798-1802. 2018.	10.21437/Interspeech.2018-69
LRE17	S. O. Sadjadi, T. Kheyrkhah, A. Tong, C. S. Greenberg, D. A. Reynolds, E. Singer, L. P. Mason, and J. Hernandez-Cordero, “The 2017 NIST language recognition evaluation,” in Proc. Odyssey, Les Sables d´ Olonne, France, June 2018, pp. 82–89	10.21437/Odyssey.2018-12
LRE15	H. Zhao, D. Bans´e, G. Doddington, C. Greenberg, J. Hern´andez-Cordero, J. Howard, L. Mason, A. Martin, D. Reynolds, E. Singer, and A. Tong, “Results of the 2015 NIST language recognition evaluation,” in Interspeech 2016, San Francisco, USA, September 2016, pp. 3206–3210	10.21437/Interspeech.2016-169
LRE96, LRE03, LRE05, LRE07, LRE09, LRE11	A. F. Martin, C. S. Greenberg, J. M. Howard, G. R. Doddington, and J. J. Godfrey, “NIST language recognition evaluation - past and future,” in Odyssey 2014, Joensuu, Finland, June 2014, pp. 145–151	10.21437/Odyssey.2014-23

NIST SRE	Citation	Link
SRE21	Sadjadi, S.O., Greenberg, C., Singer, E., Mason, L., Reynolds, D. (2022) The 2021 NIST Speaker Recognition Evaluation. Proc. The Speaker and Language Recognition Workshop (Odyssey 2022), 322-329	10.21437/Odyssey.2022-45
CTS Challenge	Sadjadi, S.O., Greenberg, C., Singer, E., Mason, L., Reynolds, D. (2022) The NIST CTS Speaker Recognition Challenge. Proc. The Speaker and Language Recognition Workshop (Odyssey 2022), 314-321	10.21437/Odyssey.2022-44
SRE19	O. Sadjadi, C. Greenberg, E. Singer, D. Reynolds, L. Mason, and J. Hernandez-Cordero, “The 2019 NIST Audio-Visual Speaker Recognition Evaluation,” in Proc. The Speaker and Language Recognition Workshop (Odyssey 2020), 2020, pp. 259–265	10.21437/Odyssey.2020-37
SRE19 CTS Challenge	S. O. Sadjadi, C. Greenberg, E. Singer, D. Reynolds, L. Mason, and J. Hernandez-Cordero, “The 2019 NIST Speaker Recognition Evaluation CTS Challenge,” in Proc. The Speaker and Language Recognition Workshop (Odyssey 2020), 2020, pp. 266–272	10.21437/Odyssey.2020-38
SRE18	S. O. Sadjadi, C. S. Greenberg, E. Singer, D. A. Reynolds, L. P. Mason, and J. Hernandez-Cordero, “The 2018 NIST speaker recognition evaluation,” in Proc. INTERSPEECH, Graz, Austria, September 2019, pp. 1483–1487	10.21437/Interspeech.2019-1351
SRE16	S. O. Sadjadi, T. Kheyrkhah, A. Tong, C. S. Greenberg, D. A. Reynolds, E. Singer, L. P. Mason, and J. Hernandez-Cordero, “The 2016 NIST speaker recognition evaluation,” in Proc. INTERSPEECH, Stockholm, Sweden, August 2017, pp. 1353–1357	10.21437/Interspeech.2017-458
SRE96 - SRE06, SRE08, SRE10, SRE12	C. S. Greenberg, L. P. Mason, S. O. Sadjadi, and D. A. Reynolds, “Two decades of speaker recognition evaluation at the National Institute of Standards and Technology,” Computer Speech & Language, vol. 60, 2020	10.1016/j.csl.2019.101032

NIST ivec	Citation	Link
ivec15	A. Tong, C. Greenberg, A. Martin, D. Banse, J. Howard, H. Zhao,G. Doddington, D. Garcia-Romero, A. McCree, D. Reynolds, E. Singer,J. Hernandez-Cordero, and L. Mason, “Summary of the 2015 NIST language recognition i-vector machine learning challenge,” in Odyssey 2016:The Speaker and Language Recognition Workshop, Bilbao, Spain, June 21-24 2016, pp. 297–302	10.21437/Odyssey.2016-43
ivec14	D. Banse, G. R. Doddington, D. Garcia-Romero, J. J. Godfrey, C. S. Green-berg, A. F. Martin, A. McCree, M. A. Przybocki, and D. A. Reynolds,“Summary and initial results of the 2013-2014 speaker recognition i-vectormachine learning challenge,” inProc. INTERSPEECH, Singapore, Singa-pore, September 2014, pp. 368–372	10.21437/Interspeech.2014-86

NIST Misc	Citation	Link
LRE Homepage	NIST Language Recognition Evaluation	nist.gov/itl/iad/mig/language-recognition
SRE Homepage	NIST Speaker Recognition Evaluation	nist.gov/itl/iad/mig/speaker-recognition
Normalized Cross-Entropy paper	A tutorial introduction to the ideas behind Normalized Cross-Entropy and the information-theoretic idea of Entropy	nist.gov/file/411831
SPHERE sw	Speech file manipulation software (SPHERE) package version 2.7, 2012	nist.gov/itl/iad/mig/tools
Babel data	M. P. Harper, "Data resources to support the Babel program,"	https://goo.gl/9aq958
DET curves	A. F. Martin, G. R. Doddington, T. Kamm, M. Ordowski, and M. A.Przybocki, "The DET curve in assessment of detection task performance," inProc. EUROSPEECH, Rhodes, Greece, September 1997, pp. 1899–1903	10.21437/Eurospeech.1997-504

LDC Data	Citation	Link
SWB-1, rel2	J. Godfrey and E.Holliman, "Switchboard-1 Release 2," 1993	catalog.ldc.upenn.edu/LDC97S62
SWB-2, Pt1	D. Graff, A. Canavan, and G. Zipperlen, "Switchboard-2 Phase I," 1998	catalog.ldc.upenn.edu/LDC98S75
SWB-2, Pt2	D. Graff, K. Walker, and A. Canavan, "Switchboard-2 Phase II," 1999	catalog.ldc.upenn.edu/LDC99S79
SWB-2, Pt3	D. Graff, D. Miller, and K. Walker, "Switchboard-2 Phase III," 2002	catalog.ldc.upenn.edu/LDC2002S06
SWBCell, Pt1	D. Graff, K. Walker, and D. Miller, "Switchboard Cellular Part 1 Audio," 2001	catalog.ldc.upenn.edu/LDC2001S13
SWBCell, Pt2	D. Graff, K. Walker, and D. Miller, "Switchboard Cellular Part 2 Audio," 2004	catalog.ldc.upenn.edu/LDC2004S07
Fisher Eng Train, Pt1 Speech	C. Cieri, D. Graff, O. Kimball, D. Miller, and K. Walker, "Fisher English Training Speech Part 1 Speech," 2004 C. Cieri, D. Miller, and K. Walker, "The Fisher corpus: A resource for the next generations of speech-to-text," inProc. LREC, Lisbon, Portugal, May2004, pp. 69–71	catalog.ldc.upenn.edu/LDC2004S13 proceedings/lrec2004
Fisher Eng Train, Pt1 Transcripts	C. Cieri, D. Graff, O. Kimball, D. Miller, and K. Walker, "Fisher English Training Speech Part 1 Transcripts," 2004	catalog.ldc.upenn.edu/LDC2004T19
Fisher Eng Train, Pt2 Speech	C. Cieri, D. Graff, O. Kimball, D. Miller, and K. Walker,"Fisher English Training Speech Part 2 Speech," 2004	catalog.ldc.upenn.edu/LDC2005S13
Fisher Eng Train, Pt2 Transcripts	C. Cieri, D. Graff, O. Kimball, D. Miller, and K. Walker, "Fisher English Training Speech Part 2 Transcripts," 2004	catalog.ldc.upenn.edu/LDC2005T19
CallMyNet	K. Jones, S. Strassel, K. Walker, D. Graff, and J. Wright, "Call my net corpus: A multilingual corpus for evaluation of speaker recognition technology," inProc. INTERSPEECH, Stockholm, Sweden, August 2017, pp.2621–2624	10.21437/Interspeech.2017-1521
MLS/MLS14	K. Jones, D. Graff, J. Wright, K. Walker, and S. Strassel, "Multi-language speech collection for NIST LRE," inProc. LREC, Portoroz, Slovenia, May 2016, pp. 4253–4258	jones-etal-2016-multi
Mixer (pt. 1)	C. Cieri, J. P. Campbell, H. Nakasone, D. Miller, and K. Walker, "The Mixer corpus of multilingual, multichannel speaker recognition data," inProc. LREC, Lisbon, Portugal, May 2004	cieri-etal-2004-mixer
Mixer (pt. 2)	C. Cieri, L. Corson, D. Graff, and K. Walker, "Resources for new research directions in speaker recognition: The Mixer 3, 4 and 5 corpora," inProc. INTERSPEECH, Antwerp, Belgium, August 2007	10.21437/Interspeech.2007-340
Mixer (pt. 3)	L. Brandschain, D. Graff, C. Cieri, K. Walker, C. Caruso, and A. Neely, "The Mixer 6 corpus: Resources for cross-channel and text independent speaker recognition," inProc. LREC, Valletta, Malta, May 2010, pp. 2441–2444	lrec2010/792
VAST	J. Tracey and S. Strassel, "VAST: A corpus of video annotation for speech technologies," inProc. LREC, Miyazaki, Japan, May 2018, pp. 4318–4321	tracey-strassel-2018-vast
SRE16 test set	S. O. Sadjadi, C. Greenberg, T. Kheyrkhah, K. Jones, K. Walker, S. Strassel, and D. Graff, "2016 NIST Speaker Recognition Evaluation Test Set," 2019	catalog.ldc.upenn.edu/LDC2019S20
SRE21 dev/test set	Sadjadi, Seyed Omid, Craig Greenberg, Elliot Singer, Lisa Mason, and Douglas Reynolds. "The 2021 NIST Speaker Recognition Evaluation." (LDC2021E10),"	arxiv.org/abs/2204.10242
Janus multimedia dataset	G. Sell, K. Duh, D. Snyder, D. Etter and D. Garcia-Romero, "Audio-Visual Person Recognition in Multimedia Data From the Iarpa Janus Program," 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 3031-3035 (LDC2019E55)	10.1109/ICASSP.2018.8462122
CTS Superset	S. O. Sadjadi, D. Graff, and K. Walker, "NIST SRE CTS Superset LDC2021E08," Web Download. Philadelphia: Linguistic Data Consortium, 2021 S. O. Sadjadi, "NIST SRE CTS Superset: A large-scale dataset for telephony speaker recognition,"arXiv preprint arXiv:2108.07118, 2021	10.48550/arXiv.2108.07118
WeCanTalk	K. Jones, K. Walker, C. Caruso, J. Wright, and S. Strassel, "WeCanTalk: A new multi-language, multi-modal resource for speaker recognition," Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), pages 3451–3456	lrec2022-we-can-talk

Contact Us

Please send questions to: [email protected]

For the CTS Challenge discussion please visit our Google Group. https://groups.google.com/a/list.nist.gov/forum/#!forum/cts-challenge

Summary

Following the success of the 2019 Conversational Telephone Speech (CTS) Speaker Recognition Challenge, which received 1347 submissions from 67 academic and industrial organizations, NIST organized a second CTS Challenge, which has been ongoing since 2020.

The basic task in the CTS Challenge is speaker detection, i.e., determining whether a specified target speaker is speaking during a given segment of speech. The CTS Challenge is a leaderboard-style challenge, offering an open/unconstrained training condition, but using CTS recordings extracted from multiple data sources containing multilingual speech

For more information about CTS-Challenge please visit the announcement page or send questions to [email protected].

Disclaimer:

Participants are allowed to publish the leaderboard results unaltered, but they must not make advertising claims about their standing/ranking in the evaluation, or winning the evaluation, or claim NIST or the U.S. Government endorsement of their system(s) or commercial product(s). See the evaluation plan for more details regarding the participation rules in the NIST CTS Challenge.

SRE24-CTS Challenge

Updated: 2024-10-24 00:59:08 -0400

RANK	TEAM	SET	TIMESTAMP	EER [%]	MIN_C	ACT_C
1	AAP	Progress	20241024-005638	2.37	0.102	0.106
2	Neurotechnology	Progress	20240902-043858	3.86	0.132	0.143
3	SAR_	Progress	20240918-112439	3.00	0.080	1.000
3	LIA_	Progress	20240809-174417	8.55	0.410	1.000
3	TEAM-CERE-91	Progress	20240901-203209	14.93	0.509	1.000
3	LIBRA	Progress	20240908-031838	30.81	0.995	1.000

Download CSV

NIST Speaker Recognition Evaluation

Summary

SRE24 Schedule

Evaluation Plan Published

Registration Period

Dev/Training data available

Evaluation period

System output and system descriptions due to NIST

Evaluation results release

Post-evaluation workshop

Contact Us

NIST 2024 Speaker Recognition Evaluation

Summary

SRE 2024 Tentative Schedule

Contact Us

Summary

SRE24-CTS Challenge