Showing posts with label Hadoop. Show all posts
Showing posts with label Hadoop. Show all posts

Wednesday, July 09, 2014

eBook of the Day: "Developing big data solutions on Microsoft Azure HDInsight" (aka Hadoop on Azure eBook @ 367 pages...)

Microsoft Downloads - Developing big data solutions on Microsoft Azure HDInsight – eBook Download

This guide explores the use of HDInsight in a range of use cases and scenarios such as iterative exploration, as a data warehouse, for ETL processes, and integration into existing BI systems. The guide is divided into three sections:

  • “Understanding Microsoft big data solutions,” provides an overview of the principles and benefits of big data solutions.
  • “Designing big data solutions using HDInsight,” contains guidance for designing solutions to meet the typical batch processing use cases inherent in big data processing.
  • “Implementing big data solutions using HDInsight,” explores a range of topics such as the options and techniques for loading data into an HDInsight cluster, the tools you can use in HDInsight to process data in a cluster, and the ways you can transfer the results from HDInsight into analytical and visualization tools to generate reports and charts, or export the results into existing data stores such as databases, data warehouses, and enterprise BI systems.

Version: 1.0

File Name:

Developing big data solutions on Microsoft Azure HDInsight.pdf

DevelopingBigDataSolutionsOnMicrosoftAzureHDInsight.epub

DevelopingBigDataSolutionsOnMicrosoftAzureHDInsight.mobi

Date Published: 7/8/2014

..."

imageimage

Only 346 pages from patterns and practices group on HDInsight (aka Hadoop)... :/

Thursday, September 26, 2013

Get a big jump into Big Data with the "Getting Started with Microsoft Big Data" series

Channel 9 - Getting Started with Microsoft Big Data

Developers, take this course to get an overview of Microsoft Big Data tools as part of the Windows Azure HDInsight and Storage services. As a developer, you'll learn how to create map-reduce programs and automate the workflow of processing Big Data jobs. As a SQL developer, you'll learn Hive can make you instantly productive with Hadoop data.

image

Added to the billion and one of things I need to learn ASAP. When I find the time and "want to" this series looks like a great way to get started. I've done a tiny bit of hadoop, and I already know I'm going to need all the help I can get up this learning curve...

Wednesday, August 28, 2013

Horton hears a Hadoop [set of icons]

Hortonworks - A Set of Hadoop-related Icons

The best architecture diagrams are those that impart the intended knowledge with maximum efficiency and minimum ambiguity. But sometimes there’s a need to add a little pizazz, and maybe even draw a picture or two for those Powerpoint moments.

We’ve built a small set of Hadoop-related icons that might help you next time you need that picture focusing on the intended function of various components. If you need the official logos then you can grab those from the various Apache project sites. We also put in some thoughts on how to use them for best effect, but feel free to ignore us and use them however you like.

For this v1.0 we’ve covered some basics of physical and software components and put them into a Powerpoint template. Love them? Hate them? Want something different? Let us know and we’ll see if we can add them to the set and maybe even build a Visio stencil…

imageimage

Because everyone needs Hadoop icons.... um... right? Well if you do, here you go...

(via gigaom - Need help with your Hadoop marketecture? Here are some open source icons)

Wednesday, August 07, 2013

Hadoop Coloring Book (no kidding)

SHMsoft blog - Hadoop coloring book for kids

While daddy or mommy is hard at work on Hadoop, or is perhaps training at the Hadoop Illuminated training course, what are the kids to do?

Now there is an option. When the tired developer comes home in the evening, he can tell his kid what he was doing at work - in pictures.

The Hadoop Coloring Book for Kids was created by our talented illustrator and can be downloaded right here.

image

image

That's awesome. I printed a copy for myself... err... um... my... kids... yeah... them...  :P

Thursday, July 11, 2013

A little Hadoop, HDInsight, Mahout, some .Net and a little StackOverflow and you have...

Amazedsaint's Tech Journal - Building A Recommendation Engine - Machine Learning Using Windows Azure HDInsight, Hadoop And Mahout

Feel like helping some one today?

Let us help the Stack Exchange guys to suggest questions to a user that he can answer, based on his answering history, much like the way Amazon suggests you products based on your previous purchase history.  If you don’t know what Stack Exchange does – they run a number of Q&A sites including the massively popular Stack Overflow. 

Our objective here is to see how we can analyze the past answers of a user, to predict questions that he may answer in future. May Stack Exchange’s current recommendation logic may work better than ours, but that won’t prevent us from helping them for our own  learning purposes .

We’ll be doing the following tasks.

  • Extracting the required information from Stack Exchange data set
  • Using the required information to build a Recommender

But let us start with the basics.   If you are totally new to Apache Hadoop and Hadoop On Azure, I recommend you to read these introductory articles before you begin, where I explain HDInsight and Map Reduce model a bit in detail.

...

Conclusion In this example, we were doing a lot of manual work to upload the required input files to HDFS, and triggering the Recommender Job manually. In fact, you could automate this entire work flow leveraging Hadoop For Azure SDK. But that is for another post, stay tuned. Real life analysis has much more to do, including writing map/reducers for extracting and dumping data to HDFS, automating creation of hive tables, perform operations using HiveQL or PIG, etc. However, we just examined the steps involved in doing something meaningful with Azure, Hadoop and Mahout.

You may also access this data in your Mobile App or ASP.NET Web application, either by using Sqoop to export this to SQL Server, or by loading it to a Hive table as I explained earlier. Happy Coding and Machine Learning!! Also, if you are interested in scenarios where you could tie your existing applications with HD Insight to build end to end workflows, get in touch with me. -

imageimageimageimage

Just the article I've been looking for. It provides a nice start to finish view of playing with HDInsight and Mahout, which is something I was pulling my hair out over a few months ago...

Tuesday, February 26, 2013

Helping you Hadoop with "Hadoop illuminated" the free weBook...

SHMsoft blog - Announcing open access Hadoop book, "Hadoop illuminated"

"Friends, we would like to tell you about an open access book on Hadoop, called "Hadoop illuminated." You can find it here.

We want to make learning about Hadoop and its ecosystem fun and engaging. The book is accompanied by its project on GitHub. The book is work in progress, we consider it in alpha stage. We will be updating and adding to it. Your feedback is welcome.

..."

Hadoop Illuminated

image

1.1. About "Hadoop illuminated"

This book is our experiment in making Hadoop knowledge available to the open source community. The book is freely available, and its code is open source.

We want this book to serve as an introduction, as a cookbook, and in later parts as an advanced manual.

"Hadoop illuminated" is work in progress. Techniques get added, chapters added and updated. We appreciate your feedback. You can follow it on Twitter, discuss it in on Google Groups, or send your feedback to our emails.

..."

Hadoop is all the rage, right? Maybe this weBook will help you towards wrapping your head around it (or not, but the price is just right anyway...)

Wednesday, February 29, 2012

Managed Hadoop.Net HDFS File Access (with C# examples)

Carl's Blog - Hadoop .Net HDFS File Access

"If you grab the latest installment of Microsoft Distribution of Hadoop you will notice, in addition to the C library, a Managed C++ solution for HDFS file access. This solution now enables one to consume HDFS files from within a .Net environment.

The purpose of this post is first to ensure folks know about the new Windows HDFS Managed library (WinHdfsManaged), provided alongside the native C library, and secondly to give a few samples of its usage from C#.

untitled

image

I still don't Hadoop, but liked seeing a .Net Managed HDFS File Access assembly, and thought there might be 0.27 other people who might also find it interesting...

Tuesday, February 14, 2012

A "Big Data in Little Words" short introduction...

SQL Server Journey with SQL Authority - SQL SERVER – What is Big Data – An Explanation in Simple Words

"Decoding the human genome originally took 10 years to process; now it can be achieved in one week - The Economist.

This blog post is written in response to the T-SQL Tuesday post of The Big Data. This is a very interesting subject. Data is growing every single day. I remember my first computer which had 1 GB of the Harddrive. I had told my dad that I will never need any more hard drive, we are good for next 10 years. I bought much larger Harddrive after 2 years and today I have NAS at home which can hold 2 TB and have few file hosting in the cloud as well. Well the point is, amount of the data any individual deals with has increased significantly.

There was a time of floppy drives. Today some of the auto correct software even does not recognize that word. However, USB drive, Pen drives and Jump drives are common names across industry. It is race – I really do not know where it will stop.

Big Data

Same way the amount of the data has grown so wild that relational database is not able to handle the processing of this amount of the data. Conventional RDBMS faces challenges to process and analysis data beyond certain very large data. Big Data is large amount of the data which is difficult or impossible for traditional relational database. Current moving target limits for Big data is terabytes, exabytes and zettabytes.

Hadoop

...

MapReduce

...

Pigs and Hives

...

Microsoft and Big Data

...

image..."

Little words fit so much better in my brain... :P

Wednesday, December 14, 2011

Three Apache Hadoop On Windows TechNet Wiki Articles - Home page to FAQ to FTP...

TechNet Articles - Apache Hadoop On Windows

This article contains links to information about using Apache Hadoop on Windows, or with other Microsoft technologies. It also provides a brief overview of Hadoop as well as overview information for the Hadoop offerings provided by Microsoft.

Table of Contents

Topics Content Types
Hadoop Overview How To
Hadoop on Windows Overview Code Examples
Apache Hadoop on Windows Server Videos
Apache Hadoop on Windows Azure Audio
Elastic Map Reduce on Windows Azure  
Learning Hadoop  
General  
Hadoop on Windows  
Hadoop Best Practices  
Managing Hadoop  
Developing with Hadoop  
Using Hadoop with other BI Technologies

...

Hadoop Overview

Apache Hadoop is an open source software framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It consists of two primary components: Hadoop Distributed File System (HDFS) – a reliable and distributed data storage and MapReduce – a parallel and distributed processing system.

HDFS is the primary distributed storage used by Hadoop applications. As you load data into a Hadoop cluster, HDFS splits up the data into blocks/chunks and creates multiple replicas of blocks and distributes them across the nodes of the cluster to enable reliable and extremely rapid computations.

Hadoop MapReduce is a software framework for writing applications that rapidly process vast amounts of data in parallel on a large cluster of compute nodes. A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Typically both the input and the output of the job are stored in a file-system. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.

Some of the main advantages of the Hadoop are that it can process vast amounts of data, hundreds of terabytes to even petabytes quickly and efficiently, process both structured and non-structured data, perform the processing where the data is rather than moving the data to the processing, and detect and handle failures by design.

There are two other technologies that are related to Hadoop: Hive and Pig. Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems such as HDFS. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.

Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.

...

Hadoop on Windows

The links in this section provide information on deploying Apache Hadoop to Microsoft Windows Platforms.

Link Description
Getting Started with Apache Hadoop for Windows An overview of the Getting Started Guides currently available.
Getting Started Deploying an On-Premise Apache Hadoop Cluster. A walkthrough for deploying Apache Hadoop to a set of servers that you manage.
Getting Started with the Windows Azure Deployment of Apache Hadoop for Windows A walkthrough for deploying Apache Hadoop compute instances on your Windows Azure Subscription.
Getting Started using a Windows Azure Deployment of Hadoop on the Elastic Map Reduce Portal. A walkthrough for provisioning a temporary Apache Hadoop cluster using the Elastic Map Reduce Portal (EMR) Portal.

..."

TechNet Articles - Apache Hadoop Based Services for Windows Azure How To and FAQ Guide

This content is a work in progress for the benefit of the Hadoop Community.

Please feel free to contribute to this wiki page based on your expertise and experience with Hadoop.

For asking questions, please use the Yahoo Group, http://tech.groups.yahoo.com/group/hadooponazurectp/

How-Tos

  1. Setup your Hadoop on Azure cluster
  2. How to run a job on Hadoop on Azure
  3. Interactive Console
    1. Tasks with the Interactive JavaScript Console
      • How to run Pig-Latin jobs from the Interactive javaScript Console
      • How to create and run a JavaScript Map Reduce Job
    2. Tasks with Hive on the Interactive Console
  4. Remote Desktop
    1. Using the Hadoop command shell
    2. View the Job Tracker
    3. View HDFS
  5. Open Ports
    1. How to connect Excel to Hadoop on Azure via HiveODBC
    2. How to FTP data to Hadoop on Azure
  6. Manage Data
    1. Import Data From Data market
    2. Setup ASV - use your Windows Azure Blob Store account
    3. Setup S3 - use your AMazon S3 account

..."

TechNet Articles - How To FTP Data To Hadoop on Windows Azure

How To FTP Data To Hadoop on Windows Azure

The Apache Hadoop distribution for Windows includes a FTP server that operates directly on the Hadoop Distributed File System (HDFS). The FTPS protocol is used for secure transfers. FTP communication is wire efficient and especially suited for transferring large data set. The steps below describe how to use the FTP server.

  1. Log into the portal on http://www.hadooponazure.com/ .
  2. Click the Open Ports tile to access the FTP server port configuration.
  3. ...

image..."

While I'm not Hadoop'ing yet, when I saw these I knew I wanted to grab them for future reference....

 

Related Past Post XRef:
Big day for Big Data... Hadoop coming to Windows Azure, Windows Server and SQL Server
Hadoop for the SQL Server DBA...
Do you Hadoop? Angel has your links, news and resources round-up...
Microsoft SQL Server Connector for Apache Hadoop RTW

Wednesday, November 23, 2011

Hadoop for the SQL Server DBA...

Brent Ozar PLF - Hadoop Basics for SQL Server DBAs

"Microsoft have made announcements about bringing Hadoop to Windows, but what does that mean for SQL Server? In this talk, Jeremiah Peschka will cover a the Hadoop ecosystem at a high level and discuss specific use cases for Hadoop. By the end of this talk, you should have a general idea of how different parts of Hadoop can be used to augment SQL Server’s rich functionality. This talk is for database professionals who are looking for ways to extend the capability of SQL Server with different pieces of the Hadoop ecosystem.

image..."

I thought this a good grab and share given all the recent SQL Server Hadoop news...

 

Related Past Post XRef:
Big day for Big Data... Hadoop coming to Windows Azure, Windows Server and SQL Server

Microsoft SQL Server Connector for Apache Hadoop RTW
Microsoft SQL Server Connector for Apache Hadoop CTP1
Do you Hadoop? Angel has your links, news and resources round-up...

Wednesday, October 12, 2011

Big day for Big Data... Hadoop coming to Windows Azure, Windows Server and SQL Server

The Official Microsoft Blog – News and Perspectives from Microsoft - Microsoft Expands Data Platform to Help Customers Manage the ‘New Currency of the Cloud’

"This morning, I gave a keynote at the PASS Summit 2011 here in Seattle, a gathering of about 4,000 IT professionals and developers worldwide. I talked about Microsoft’s roadmap for helping customers manage and analyze any data, of any size, anywhere -- on premises, and in the private or public cloud.

Microsoft makes this possible through SQL Server 2012 and through new investments to help customers manage ‘big data’, including an Apache Hadoop-based distribution for Windows Server and Windows Azure and a strategic partnership with Hortonworks. Our announcements today highlight how we enable our customers to take advantage of the cloud to better manage the ‘currency’ of their data.

...

Microsoft has a rich, decades-long legacy in helping customers get more value from their data. Beginning with OLAP Services in SQL Server 7, and extending to SQL Server 2012 features that span beyond relational data, we have a solid foundation for customers to take advantage of today. The new addition of an Apache Hadoop-based distribution for Windows Azure and Windows Server is the next building block, seamlessly connecting all data sizes and types. ..."

All About Microsoft (Mary Jo Foley) - Microsoft to develop Hadoop distributions for Windows Server and Azure

"Microsoft is stepping up its support for Hadoop with its Windows Server and Windows Azure deliverables and will be offering its contributions back to the Apache Software Foundation and the Hadoop project.

By developing its own implementations of the Hadoop stack, Microsoft is looking to provide customers with another option for big data/unstructured data storage and access, officials said.

Microsoft officials made the announcement at the SQL PASS Summit on October 12. Company execs also confirmed at the event what I’ve been expecting for the past month or so: SQL Server “Denali” will be officially named SQL Server 2012 when it ships in the first part of 2012. (Server and Tools boss Satya Nadella said earlier this year to expect Denali to ship in the “early part” of next year.) Denali is currently at the Community Technology Preview (CTP) 3 stage; next stop is RTM and general availability (no beta is on the roadmap).

Microsoft is going to be working with Hadoop core contributors from Yahoo Hadoop spinoff Hortonworks. Microsoft and Hortonworks are readying a CTP test build of their Hadoop-based service for Windows Azure for delivery before the end of calendar 2011 and a CTP of the Hadoop-based distribution for Windows Server some time in 2012 ...

..."

PHP Blogs from Port25 - Microsoft, Hadoop and Big Data

In a couple of weeks it will be my one year anniversary here at Microsoft and I couldn’t wish for a better anniversary gift: now that Microsoft has laid out its roadmap for Big Data, I’m really excited about the role that Apache Hadoop plays in this.

In case you missed it, Microsoft Corporate Vice President Ted Kummert earlier today announced that we are adopting Hadoop by announcing plans to deliver enterprise class Apache Hadoop based distributions on both Windows Server and Windows Azure.

...

Technical Considerations

On the more technical front, we have been working on a simplified download, installation and configuration experience of several Hadoop related technologies, including HDFS, Hive, and Pig, which will help broaden the adoption of Hadoop in the enterprise.

The Hadoop based service for Windows Azure will allow any developer or user to submit and run standard Hadoop jobs directly on the Azure cloud with a simple user experience.

Let me stress this once again: it doesn’t matter what platform you are developing your Hadoop jobs on -you will always be able to take a standard Hadoop job and deploy it on our platform, as we strive towards full interoperability with the official Apache Hadoop distribution.

This is great news as it lowers the barrier for building Hadoop based applications while encouraging rapid prototyping scenarios in the Windows Azure cloud for Big Data.

To facilitate all of this, we have also entered into a strategic partnership with Hortonworks that enables us to gain unique experience and expertise to help accelerate the delivery of Microsoft’s Hadoop based distributions on both Windows Server and Windows Azure.

For developers, we will enable integration with Microsoft developer tools as well as invest in making Javascript a first class language for Big Data. We will do this by making it possible to write high performance Map/Reduce jobs using Javascript. Yes, Javascript Map/Reduce, you read it right.

..."

Straight Path IT Solutions - Microsoft Loves Your Big Data

"This week at the SQL PASS Summit, Ted Kummert – Corporate Vice President of the Business Platform Division at Microsoft (think SQL Server) made an announcement in Wednesday’s keynote presentation. It is an awesome announcement for companies who have “big data” (think semi structured or even unstructured large data sets. Think data that is perhaps a bit too bulky or requires too much formatting to analyze effectively in what we think of when we think of relational databases… Clicks, Tweets, Texts, credit card transactions, health care data streams, etc.) and want to have newer and better ways to work with it.

The Announcement?

You’ve heard of Hadoop? No? Well Jeremiah Peschka does a great job with a quick explanation in this post.

Well Microsoft wants you to work with data in Hadoop. So they announced that this is a large part of their data strategy and there are two really neat ways they are going to be implementing this:

...

Become a Player Themselves –> This one had me scratching my head when I first heard it, “wait, Microsoft wants to deploy hadoop on Windows and Azure?! The facebook’s and .com’s of the world won’t ever buy it, they love the open source community, they hate paying for licensing.” But then it hit me… They are not aiming for the flash and hip web companies who have already embraced hadoop… They are actually offering something in the market that has a really compelling story and call to action. ...

...

Why I’m Happy

Some would say that Microsoft sees a positive trend and tries to copy it normally. They try to make it their own and sometimes they get it right, sometimes they don’t. It is a copy though. They take some good and interesting ideas and “microsoft-ize” them. I’ve been working with (and loving) SQL Server for 12 years, so don’t get me wrong when I say this, but sometimes they miss the mark. With this? They aren’t copying, or borrowing or trying to redo… They are embracing. They are looking at why people use a tool like Hadoop. They are asking good questions about it and saying, let’s embrace the open source community their standards and all their work and let’s make a platform and integration for it. They are saying, “Hadoop – you do what you are great at, don’t go changing, here let’s help reach other customers and we’ll extend this great tool set with what we really know and are good at – enterprise support, manageability, instrumentation, reliability” That is cool. That is big.

..."

So in short, if you do Big Data" now's the time to start getting up to speed on Hadoop...

 

Related Past Post XRef:
Microsoft SQL Server Connector for Apache Hadoop RTW
Microsoft SQL Server Connector for Apache Hadoop CTP1
Do you Hadoop? Angel has your links, news and resources round-up...

Tuesday, October 04, 2011

Microsoft SQL Server Connector for Apache Hadoop RTW

Microsoft Downloads - Microsoft SQL Server Connector for Apache Hadoop

"Microsoft SQL Server Connector for Apache Hadoop (SQL Server-Hadoop Connector) RTM is a Sqoop-based connector that facilitates efficient data transfer between SQL Server 2008 R2 and Hadoop. Sqoop supports several databases.

Version: 1.0
Date Published: 10/4/2011

Language: English

  • Microsoft SQL Server-Hadoop Connector User Guide.pdf, 878 KB
  • SQL Server Connector for Apache Hadoop MSLT.pdf, 220 KB
  • sqoop-sqlserver-1.0.tar.gz, 1.0 MB
  • THIRDPARTYNOTICES FOR HADOOP-BASED CONNECTORS.txt, 33 KB

The Microsoft SQL Server Connector for Apache Hadoop extends JDBC-based Sqoop connectivity to facilitate data transfer between SQL Server and Hadoop, and also supports the JDBC features as mentioned in SQOOP User Guide on the Cloudera website. In addition to this, this connector provides support for nchar and nvarchar data types

With SQL Server-Hadoop Connector, you import data from:

  • tables in SQL Server to delimited text files on HDFS
  • tables in SQL Server to SequenceFiles files on HDFS
  • tables in SQL Server to tables in Hive*
  • result of queries executed on SQL Server to delimited text files on HDFS
  • result of queries executed on SQL Server to SequenceFiles files on HDFS
  • result of queries executed on SQL Server to tables in Hive*

Note: importing data from SQL Server into HBase is not supported in this release.

With SQL Server-Hadoop Connector, you can export data from:

  • delimited text files on HDFS to SQL Server
  • sequenceFiles on HDFS to SQL Server
  • hive Tables* to tables in SQL Server

* Hive is a data warehouse infrastructure built on top of Hadoop (http://wiki.apache.org/hadoop/Hive). We recommend to use hive-0.7.0-cdh3u0 version of Cloudera Hive.

Sqoop is an open source connectivity framework that facilitates transfer between multiple Relational Database Management Systems (RDBMS) and HDFS. Sqoop uses MapReduce programs to import and export data; the imports and exports are performed in parallel with fault tolerance.

The Source / Target files being used by Sqoop can be delimited text files (for example, with commas or tabs separating each field), or binary SequenceFiles containing serialized record data. Please refer to section 7.2.7 in Sqoop User Guide for more details on supported file types. For information on SequenceFile format, please refer to Hadoop API page.

Supported Operating Systems: Linux, Windows Server 2008 R2

Linux (for Hadoop setup) and Windows (with SQL Server 2008 R2 installed). Both are required to use the SQL Server-Hadoop Connector

..."

I don't Hadoop yet, but it's starting to get some interest and eyeballs in my day-job's field so want to keep an eye on it...

 

Related Past Post XRef:
Microsoft SQL Server Connector for Apache Hadoop CTP1
Do you Hadoop? Angel has your links, news and resources round-up...

Wednesday, August 24, 2011

Microsoft SQL Server Connector for Apache Hadoop CTP1

Microsoft Downloads - Microsoft SQL Server Connector for Apache Hadoop

"Microsoft SQL Server Connector for Apache Hadoop (SQL Server-Hadoop Connector) CTP is a Sqoop-based connector that facilitates efficient data transfer between SQL Server 2008 R2 and Hadoop. Sqoop supports several databases including MySQL

Version: CTP 1.0
Date Published: 8/24/2011

EULA_SQL Server Connector for Apache Hadoop_CTP (Sept 2011).docx - 23 KB

Microsoft SQL Server - Hadoop Connector User Guide.pdf - 782 KB

sqoop-sqlserver-1.0.tar.gz - 782 KB

THIRDPARTYNOTICES FOR HADOOP-BASED CONNECTORS.rtf - 85 KB

The Microsoft SQL Server Connector for Apache Hadoop extends JDBC-based Sqoop connectivity to facilitate data transfer between SQL Server and Hadoop, and also supports all the features as mentioned in SQOOP User Guide on the Cloudera website. In addition to this, this connector provides support for nchar and nvarchar data types

With SQL Server-Hadoop Connector, you import data from:

  • Tables in SQL Server to delimited text files on HDFS
  • Tables in SQL Server to SequenceFiles files on HDFS
  • Tables in SQL Server to tables in Hive*
  • Queries executed in SQL Server to delimited text files on HDFS
  • Queries executed in SQL Server to SequenceFiles files on HDFS
  • Queries executed in SQL Server to tables in Hive*

  • With SQL Server-Hadoop Connector, you can export data from:

  • Delimited text files on HDFS to SQL Server
  • SequenceFiles on HDFS to SQL Server
  • Hive Tables* to tables in SQL Server
    * Hive is a data warehouse infrastructure built on top of Hadoop (http://wiki.apache.org/hadoop/Hive)
  • ..."

    Not sure how/if I can use this, but since I've been keeping a wider eye open for Hadoop stuff, I thought it interesting...

     

    Related Past Post XRef:
    Do you Hadoop? Angel has your links, news and resources round-up...

    Monday, August 15, 2011

    Do you Hadoop? Angel has your links, news and resources round-up...

    Angel “Java” Lopez on Blog - Hadoop: Links, News and Resources (1)

    "After my posts with links about Scalability and MapReduce, it’s time to share my links about Hadoop...

    image..."

    On a low priority background thread I've been keep an eye open for Hadoop news and thinking that I should take a closer look at it, but didn't really know where to start. Well that problem has been solved! :P