Tad's IT Blog
Posts tagged foreign-language
Foreign-Language SEO: How to Make your File Names
May 13th
As a follow-up to the last post I did on foreign-language SEO, here’s an interesting comparison on foreign SEO with respect to page titles, URI paths, and browser-based translation of non-7-bit-ASCII titles.
Take one phrase, such as Googling for “common cold” in various languages:
Japanese:
Googling for “感冒” in google.jp gives you 2 results in the top 10 that directly link to Japanese characters in the URL itself (which will of course be rewritten in the browser), and 3 of the top 4 had the phrase in Japanese chars in the title, all of them in the snippet.
Interestingly, the #1 result has a decoded UTF-8 version of the characters as the URL:
d.hatena.ne.jp/keyword/%B4%B6%CB%C1
Russian:
Googling for “простудой” in google.ru gives you zero results that have Russian characters in the first 3 pages of results. Most are CMS-driven sites that have letters & numbers in the URLs themselves.
All of the top 10 results, though, have the keyword in Russian chars in the subject and most have it in the snippet as well.
Greek:
Similar to Russian, most of the top 10 results for “κοινο κρυολογημα” are CMS-type sites with the keyword in the title and description, but no keyword or transliteration of the keyword apparent in the URL itself.
German:
The word I have for common cold in German is “Erkältung”. May not be right, but it serves me well as it has an a(umlaut) in the spelling.
Googling for this in Google.de gives a very interesting result. The number 1 result, a Wikipedia entry, is for de.wikipedia.org/wiki/Erkältung — with the a(umlaut) in the display URL. The #4 result has this as well — www.gesundheitpro.de/Erkältung.
However, the #3, #5, and #10 results utilize the transliterated “ae” spelling in the title – www.erkaeltung-online.de andwww.netdoktor.de/krankheiten/fakta/erkaeltung.htm andwww.aspirin.de/erkaeltung/index.html — which leads one to believe that it’s just as effective, in the European languages, to use the transliterated version of these characters in SEO-sensitive elements like the filename, title and description.
I’m not sure which conclusions this draws me to, but it’s data. Anyone have insight or suggestions?
Gotchas of Foreign-Language SEO
May 13th
A passion of my life for some time has been in figuring out the details of foreign-language and foreign-character-set SEO. How do you do Search Engine Optimization for foreign character sets – and specifically SEO on languages that do not use traditional roman characters, but instead use Cyrillic, Kanji, Mandarin or Greek characters?
SEO is getting to be more and more a normal thing to do, and less and less of a hidden black art. Google has made it plain enough times that what they want is good, fresh, updated, relevant content, and not a bunch of garbage.
Pursuant to that, you’ve got a ton of fairly-well-documented best-practices for SEO’ing your site. And, if you don’t know the first thing about SEO at all — well — read a good book on the subect. My favourites are:
Or you can just hit SEOMoz or SEOBook for some hot tips.
But one unfortunate thing is that most of the best SEO data is coming people who are ignorant Americans like me. Despite my love of geography and far-off places, I can speak no foreign languages fluently, except for some Korean bad words I learned from fellow soccer players.
What does that have to do with anything?
Take the preceding picture I just linked to where I’m doing a soccer throw-in.
Assuming you could edit that page, if you ask any search engine novice to optimize that page to show up well for its subject matter, they’d probably tell you to hit the easy things first. They’d tell you to optimize:
- HTML <title> tag
- <meta description=> tag
- <meta keywords=> tag
- <H1> text
- Body text
- text of inbound links
- filename of the page
Ideally your page would have “Soccer Throw-In” or a more unique title and <h1> text, and would have a description and set of meta keywords that followed along. Ideally, as well, you’d have a filename like “/soccer-throw-in.html” or similar.
Easy, right? Of course it is — in English.
But, let’s say you have similar items in German, or worse, Japanese, Greek and Russian!
As an example, the Japanese word for “soccer” is “サッカー“. What do you make as the page title for that? The filename?
If you do a google.jp search for “サッカー“, one of the first results you get is a Wikipedia article for “サッカー” which has a displayed URL of:
Now, of course, anyone with any technical sense will tell you that you can’t put non 7-bit ASCII URLs into an HTTP request, as that violates the spec.
But of course, pasting such a URL into your browser automatically decodes it to:
http://ja.wikipedia.org/wiki/%E3%82%B5%E3%83%83%E3%82%AB%E3%83%BC
So, it has the benefit of (a) showing up with the proper Japanese term in the search engine result page, improving the apparent relevance of the result, and (b) well showing up at all in the top 10 listings at all — so you’d think it has SOME positive impact in ranking.
European terms are much easier, as there are common transliterations for many of the non-7-bit-ASCII characters that one would use in normal usage.
For example, Google for the beautiful German city of Düsseldorf. Clearly, one wouldn’t want to have to title all one’s pages as “Dusseldorf” as that would mean “village of idiots” as opposed to Düsseldorf which refers to the small tributary of the River Rhine. The u umlaut is easily transliterated to “ue” generally, so by Googling for “Duesseldorf” you get an acceptable result – as Google knows what you’re talking about.
Not so easy with these other languages like Greek, Hebrew, Hindi, etc.
I’m very interested for any input or feedback on this, as it’s a massive gray area right now — and I don’t know if ANYONE has this one covered well.







Recent Comments