2019 talk:Technology outreach & innovation/URL shortener: How something so simple can be so complicated
Appearance
- Some partial notes on this presentation
- Extension:ShortUrl was developed starting in May 2011
- an RFC in Nov 2012 https://w.wiki/9
- The work got stalled on an abuse protection issue in Dec 2016, see https://w.wiki/6vS
- Wikidata team at WMDE took over and deployed it in March-April 2019
- then handed back to WMF for maintenance
- some internals to a URL shortener
- there are three layers to such a thing: Apache + Varnish cache in the front end ; Extension:UrlShortener; and database
- the database has a column with the various shortened codes and on the same row another column with what they unpack to
- the characters in a shortened URL often avoid 0 and o because they can be mistaken for one another; also 1 is avoided it seems ; they can include other digits, upper and lower case English, and a $ character
- Complications: cache; domain whitelist; abuse prevention ; rate limit (DDoS protection) ; TLS/SSL + domain registration ; dumps ; URL shortening functionality ; size limit ; normalizing ; twitter ; "Easter eggs"
- Abuse example: hiding the URL to spam/advertising inside a shortened URL and then posting the shortened URL so as to evade detection of the blacklisted URL, or see example here: https://w.wiki/6vS
- Easter eggs might be strings found in the shortened URL
- For more, see https://meta.wikimedia.org/wiki/Wikimedia_URL_Shortener
Start a discussion about 2019:Technology outreach & innovation/URL shortener: How something so simple can be so complicated
Talk pages are where people discuss how to make content on Wikimania the best that it can be. You can use this page to start a discussion with others about how to improve 2019:Technology outreach & innovation/URL shortener: How something so simple can be so complicated.