One of the facets of Open Access Repository Junction is that it will know about languages in repositories, and countries repositories [claim to] live in… the benefit being that the Junction could return a list of all repositories in a particular country (useful for places like Africa), or that have a particular language in the interface (list all the french-speaking repositories in Canada, for example)
The thing that is needed, however, is a definitive list of countries, and languages.
Languages are fairly easy: The Library of Congress is the Authoritative source for the “ISO 639.2″ list of languages, and provides a downloadable text file of three-letter codes; 2-letter codes; English name; and French name.
It would be nice to harvest the Wikipedia page on language codes as that includes the local name for the language… but that’s a “later version” thing.
Country Codes are again, seemingly simple: iso.org maintains the ISO 3166 list of codes, and provides a downloadable file of name:code for free.
The interesting, and again, a “later version enhancement”, data source is Appendix D of the CIA World Fact Book – which links the ISO 3166 code with the TLD domain code.