THE PUNTUEUS OBSERVATORY
SITUATION OF THE .EUS DOMAIN
Quantitative and qualitative analysis of the domain: number of domain names and their distribution in terms of territory, type of organisation and level of domain penetration.
THE INTERNET IN THE BASQUE COUNTRY
Number and trends in the domain names registered in the Basque Country by the main TLDs
THE BASQUE LANGUAGE ON THE INTERNET IN THE BASQUE COUNTRY
The presence of Basque on the Internet, in the main TLDs and in the social networks
This study not only analyses how far the .EUS domain and the main Internet domains have penetrated the Basque Country, it also examines the presence of Basque and the other dominant languages.
To do this, the entire content of the websites corresponding to the domains was analysed and classified by language. That way, it will be possible to know how much content in each of the Basque Country’s domains is in Basque, Spanish, English, French and other languages. Two strategies were used to conduct this domain analysis:
- Domain level web searches: The idea behind this strategy is to make use of Web browsers (Google, Bing, etc.) in measuring how much presence a language has in a website. By running a search comprising the most significant words in a language (language filter words) in a specific domain in the web browsers, we can calculate the number of content items in the language. That way we avoid having to download website content. That is why we use this strategy to process large-sized websites. We do not apply this to small-sized websites because many websites with little content are not fully indexed in the browsers. To confirm that the language filter words have functioned correctly, we classify -by means of statistical models- the first results returned by the browsers according to language; the purpose is to check the number of pages returned by the web browsers, in accordance with the language filters.
It goes without saying that this measuring process that needs to be completed is very complex, and that is why even if the precision of the two strategies is very high, there is a margin of error. At the end of the day, the measuring process comprises a number of steps, and each of these steps has a small error rate, which accumulates throughout the whole chain. According to our calculations, the precision of the results of the measurement is between 70% and 80%.
We are incorporating improvements into the system to reduce this error rate in the yearly measurements, and these improvements are making their presence felt in the results. In the 2017 analysis, for example, more redirects were taken into consideration in the crawling process, and many domain parking pages were automatically blocked. We have, however, used a new statistical language model to be able to identify pieces of texts in Basque in multilingual texts, and we have also taken websites with very short texts into consideration.
In addition to these automatic strategies, the presence of Basque in the case of the .EUS domains was measured manually, thus achieving a higher level of precision.
With respect to domain distribution, the following domains were analysed:
- gTLDs or generic Top Level Domains: .EUS, .COM, .NET, .INFO, .ORG and .BIZ
- ccTLD or country code Top Level Domains: The .ES and .FR domains were analysed. In the case of the .EUS domain, various pieces of data needed to be able to make language analyses are not public, and this is indicated throughout the corresponding analysis.