ner_annotation / constants.py
MartinT's picture
feat: Init
de6d748
NER_TAGS = [
{"name": "organization", "label": "Organization", "color": "#F1CBCB"},
{"name": "metric", "label": "Metric", "color": "#CAEACA"}
]
NER_DATA = [
[
{"text": "At "},
{"text": "Santander", "tag": "organization"},
{"text": " our mission is to help people and businesses prosper. "},
{"text": "We are always looking for ways to help our customers understand their financial health "},
{"text": "and identify which products and services might help them achieve their monetary goals. "},
{"text": "Our data science team is continually challenging our machine learning algorithms, working with "},
{"text": "the global data science community to make sure we can more accurately identify new ways "},
{"text": "to solve our most common challenge, binary classification problems such as: "},
{"text": "is a customer satisfied? Will a customer buy this product? Can a customer pay this loan? "},
{"text": "In this challenge, we invite Kagglers to help us identify which customers will make "},
{"text": "a specific transaction in the future, irrespective of the amount of money transacted. "},
{"text": "The data provided for this competition has the same structure as the real data we have available "},
{"text": "to solve this problem."}
],
[
{"text": "Many people struggle to get loans due to insufficient or non-existent credit histories. "},
{"text": "And, unfortunately, this population is often taken advantage of by untrustworthy lenders. "},
{"text": "Home Credit", "tag": "organization"},
{"text": " strives to broaden financial inclusion for the unbanked population by providing "},
{"text": "a positive and safe borrowing experience. "},
{"text": "In order to make sure this underserved population has a positive loan experience, "},
{"text": "Home Credit", "tag": "organization"},
{"text": " makes use of a variety of alternative data--including telco & transactional information"},
{"text": "--to predict their clients repayment abilities. While "},
{"text": "Home Credit", "tag": "organization"},
{"text": " is currently using various statistical and machine learning methods to make "},
{"text": "predictions, they're challenging Kagglers to help them unlock "},
{"text": "the full potential of their data. "},
{"text": "Doing so will ensure that clients capable of repayment are not rejected "},
{"text": "and that loans are given with a principal, maturity, and repayment calendar that will empower "},
{"text": "their clients to be successful."}
],
[
{"text": "Imagine standing at the check-out counter at the grocery store with a long line behind you "},
{"text": "and the cashier not-so-quietly announces that your card has been declined. "},
{"text": "In this moment, you probably aren’t thinking about the data science that determined your fate. "},
{"text": "Embarrassed, and certain you have the funds to cover everything needed for an epic "},
{"text": "nacho party for 50 of your closest friends, you try your card again. "},
{"text": "Same result. As you step aside and allow the cashier to tend to the next customer, "},
{"text": "you receive a text message from your bank. "},
{"text": "'Press 1 if you really tried to spend $500 on cheddar cheese.' "},
{"text": "While perhaps cumbersome (and often embarrassing) in the moment, "},
{"text": "this fraud prevention system is actually saving consumers millions of dollars per year. "},
{"text": "Researchers from the "},
{"text": "IEEE Computational Intelligence Society (IEEE-CIS)", "tag": "organization"},
{"text": " want to improve this figure, while also improving the customer experience. With higher "},
{"text": "accuracy", "tag": "metric"},
{"text": " fraud detection, you can get on with your chips without the hassle. "},
{"text": "IEEE-CIS", "tag": "organization"},
{"text": " works across a variety of AI and machine learning areas, including deep neural networks, "},
{"text": "fuzzy systems, evolutionary computation, and swarm intelligence. "},
{"text": "Today they’re partnering with the world’s leading payment service company, "},
{"text": "Vesta Corporation", "tag": "organization"},
{"text": ", seeking the best solutions for fraud prevention industry, "},
{"text": "and now you are invited to join the challenge. "},
{"text": "In this competition, you’ll benchmark machine learning models on a challenging large-scale dataset. "},
{"text": "The data comes from "},
{"text": "Vesta", "tag": "organization"},
{"text": "'s real-world e-commerce transactions "},
{"text": "and contains a wide range of features from device type to product features. "},
{"text": "You also have the opportunity to create new features to improve your results. "},
{"text": "If successful, you’ll improve the efficacy of fraudulent transaction alerts for millions of people "},
{"text": "around the world, helping hundreds of thousands of businesses reduce their "},
{"text": "fraud loss", "tag": "metric"},
{"text": " and increase their "},
{"text": "revenue", "tag": "metric"},
{"text": ". And of course, you will save party people just like you the hassle of "},
{"text": "false positives", "tag": "metric"},
{"text": "."}
],
[
{"text": "How much camping gear will one store sell each month in a year? "},
{"text": "To the uninitiated, calculating sales at this level may seem as difficult as predicting the weather. "},
{"text": "Both types of forecasting rely on science and historical data. "},
{"text": "While a wrong weather forecast may result in you carrying around an umbrella on a sunny day, "},
{"text": "inaccurate business forecasts could result in actual or opportunity losses. "},
{"text": "In this competition, in addition to traditional forecasting methods you’re also challenged to use "},
{"text": "machine learning to improve forecast "},
{"text": "accuracy", "tag": "metric"},
{"text": ". The Makridakis Open Forecasting Center (MOFC) at the "},
{"text": "University of Nicosia", "tag": "organization"},
{"text": " conducts cutting-edge forecasting research and provides business forecast training. "},
{"text": "It helps companies achieve accurate predictions, estimate the levels of uncertainty, "},
{"text": "avoiding costly mistakes, and apply best forecasting practices. "},
{"text": "The MOFC is well known for its Makridakis Competitions, the first of which ran in the 1980s. "},
{"text": "In this competition, the fifth iteration, you will use hierarchical sales data from Walmart, "},
{"text": "the world’s largest company by "},
{"text": "revenue", "tag": "metric"},
{"text": ", to forecast daily sales for the next 28 days. "},
{"text": "The data, covers stores in three US States (California, Texas, and Wisconsin) "},
{"text": "and includes item level, department, product categories, and store details. "},
{"text": "In addition, it has explanatory variables such as "},
{"text": "price, promotions, day of the week, and special events. "},
{"text": "Together, this robust dataset can be used to improve forecasting "},
{"text": "accuracy", "tag": "metric"},
{"text": ". If successful, your work will continue to advance the theory and practice of forecasting. "},
{"text": "The methods used can be applied in various business areas, such as setting up appropriate "},
{"text": "inventory or service levels. Through its business support and training, "},
{"text": "the MOFC will help distribute the tools and knowledge so others can achieve more accurate "},
{"text": "and better calibrated forecasts, reduce waste and be able to appreciate uncertainty and its risk "},
{"text": "implications."}
],
[
{"text": "Nothing ruins the thrill of buying a brand new car more quickly than seeing your new insurance bill. "},
{"text": "The sting’s even more painful when you know you’re a good driver. "},
{"text": "It doesn’t seem fair that you have to pay so much if you’ve been cautious on the road for years. "},
{"text": "Porto Seguro, one of Brazil’s largest auto and homeowner insurance companies, completely agrees. "},
{"text": "Inaccuracies in car insurance company’s claim predictions raise the cost of insurance for "},
{"text": "good drivers and reduce the price for bad ones. "},
{"text": "In this competition, you’re challenged to build a model that predicts the probability that "},
{"text": "a driver will initiate an auto insurance claim in the next year. While "},
{"text": "Porto Seguro", "tag": "organization"},
{"text": " has used machine learning for the past 20 years, "},
{"text": "they’re looking to Kaggle’s machine learning community to explore new, more powerful methods. "},
{"text": "A more accurate prediction will allow them to further tailor their prices, and hopefully "},
{"text": "make auto insurance coverage more accessible to more drivers."}
]
]