Introducing Analyst’s Notebook in Savanna 4.2

It seems like only yesterday that we released Savanna 4.1, and here we are with a new release and exciting news; IBM Analyst’s Notebook is here! Savanna 4.2, our all-source analysis solution, adds many new features and enhancements, including integration with Analyst’s Notebook, giving you comprehensive, easy-to-use tools to make your analysis experience intuitive, fun and fast.

While there were a variety of added features and enhancements, we will focus on three key developments: integration with IBM Analyst’s Notebook, enhanced Dashboard and temporal Map filters.

IBM Analyst’s Notebook integration

The integration of Analyst’s Notebook’s data analysis capabilities and Savanna’s dynamic concept modeling tools offers a holistic suite of tools catering to all areas of investigation.

4.2’s integration with Analyst’s Notebook enables analysts to work seamlessly with Analyst’s Notebook Charts (ANB Charts) within the Savanna environment. After compiling data in Analyst’s Notebook, you can easily upload, search and view ANB Charts in Savanna, with all data indexed to allow Savanna’s Search tool to pull key terms from within a Chart for quick discovery and analysis.

View Charts in Savanna

View Charts in Savanna

For example, you might be investigating cartel movements in Mexico and have a Chart file showing connections between numerous cartel members. After uploading the Chart to Savanna, you can Search, view and interact with the Chart. For instance, you might want to zoom in on key relationships between cartel members and take a screenshot to be used later in your analysis.

Dashboards: Customizable, Collaborative and Really Cool

The new Dashboard feature offers customizable, problem-specific hubs from which to launch your analysis workflow.

Analysts have long used Savanna to create problem areas (Spaces) where they can house and organize the information that they have collected and created. For example, an analyst might create a Space to house content and findings related to their analysis work about Mexican cartel movements.

Now, Dashboards act as the home page of each Space, giving you easy access to important information and providing alerts for recent activities, uploads, and models created within a Space. For example, a fellow team member may have uploaded an ANB Chart related to your current analysis. The Dashboard provides an alert to help you stay on top of the most current information available.

4.2's new Dashboard is easy to customize for quick collaboration.

4.2’s new Dashboard is easy to customize for quick collaboration.

Temporal Filters for Timely Information

With Map’s new temporal filter, you can quickly filter data to reveal date and location patterns at a glance. Simply drag the filter over a specific period of time to view data points on Map relevant only to the selected period. The temporal filter also helps you view data as part of a larger historical whole, making it easy to discover trends that reveal themselves over time in a geospatial context.

Filter Map data temporally to quickly view date range and count.

Filter Map data temporally to quickly view date range and count

To keep up with upcoming releases and new features, follow our blog. In the meantime, you can keep yourself entertained by watching Savanna in action on our YouTube page at http://www.youtube.com/ThetusCorp. Until next time.

Investigating Visa Fraud with Savanna

Although it’s not a story often portrayed in the news, visa fraud is a widespread problem with a multitude of variables, making it tricky to track and prevent. One type of visa that is of particular interest is the H-2B visa for temporary or seasonal nonagricultural labor. The US plans to admit 66,000 workers under H-2B visas in 2015, and the cap of 33,000 for the first half of the year was reached on January 26th (“Cap Count for H-2B Nonimmigrants”). While many applications are legitimate, and criminal prosecutions for H-2B violations are rare, abuse of the program is common (“Officials at N.C. company International Labor Management are charged with visa fraud”). Employers ask for more workers than they need, or ask for workers for longer periods of time than the standard seasonal time period. Because employers aren’t responsible for housing costs, H-2B visas cost employers less, unlike the H-2A program. Because of this, fraud and abuse of the H-2B visa program becomes more prevalent.

Analyst's Notebook Chart finds connections between pending H-2B applicant and suspicious sponsor employer

Analyst’s Notebook Chart finds connections between pending H-2B applicant and suspicious sponsor employer

Combining Savanna, our all-source analysis software, with the powerful data analysis of IBM Analyst’s Notebook, Thetus decided to take a closer look at H-2B visa fraud to determine potential prevention strategies.

With IBM’s Identity Insight and Analyst’s Notebook, we were able to visualize a suspicious pending H-2B application and all of its related connections. This helped us identify and flag several pending H-2B applications of concern that have multiple sponsor contact names but only one sponsor employer, which is a common pattern of fraudulent activity.

After importing the Analyst’s Notebook Chart built around the suspicious H-2B applicant and his connections into Savanna, we can capture and expand on discoveries made in Analyst’s Notebook and find out more information on the suspicious sponsor employer.

We build a Crumbnet, Savanna’s mind-mapping tool, to outline the discoveries from Analyst’s Notebook and frame our analysis in narrative form. Savanna’s dynamic Occurrence documents helps us compile existing knowledge and connections on each pending H-2B applicants and the suspicious sponsor employer. A quick key word Search reveals previously built Charts uploaded by another Savanna user identifying the suspicious sponsor employer as a previously investigated company with ties to drug cartels. At this point, it is clear a larger visa fraud investigation is needed and we compile our findings in a Note to share with team members and send to investigators for further action.

View each step of the visa fraud analysis in further detail in the full demo video below:

Reference:

“Cap Count for H-2B Nonimmigrants,” USCIS.gov, last modified March 4, 2015, http://www.uscis.gov/working-united-states/temporary-workers/cap-count-h-2b-nonimmigrants

Ken Otterbourg, “Officials at N.C. Company International Labor Management are Charged with Visa Fraud,” Washingtonpost.com, last modified February 20, 2014,

http://www.washingtonpost.com/politics/officials-at-nc-company-are-indicted-for-falsifying-visas-in-guest-worker-program/2014/02/20/f30b1a1a-9985-11e3-b931-0204122c514b_story.html

Artisanal Handcrafted Ontologies

Macrame_OwlsHere at Thetus, we use the OWL Web Ontology Language to create our semantic knowledge models. Engineers working with OWL typically edit their models with tools like Protégé or TopBraid Composer, but sometimes you want or need to create a model by hand.  Semantic Modelers at Thetus often edit ontologies by hand because our knowledge modeling engine, Publisher, uses our own in-house OWL serialization syntax, known as Thetus Markup Language (TML).  TML is an XML syntax that was created as a friendlier alternative to RDF/XML.

Maintaining a proprietary serialization syntax and the toolchain to support it is a lot of work, so we’ve begun to consider alternatives.  As the Semantic Web has matured, many serialization syntaxes for RDF and OWL have been proposed, and choosing one can be a little overwhelming.  For handcrafted ontologies, many Semantic Web veterans recommend using Turtle, but they don’t often explain why it’s the best choice.  Working with our own syntax has given modelers at Thetus some strong opinions about what we were looking for in a new one, and since we weren’t very familiar with many of the available syntaxes, we decided to do some research and draw our own conclusions.

The modeling team at Thetus reviewed about a dozen syntaxes for this effort: the OWL Functional Syntaxes, OWL/XML, RDF/XML, Manchester Syntax, Turtle, N-Triples, JSON-LD, Notation 3 (N3), TriG, TriX, and N-Quads.  With the help of various syntax conversion tools, we converted some ontologies we’ve worked with to each of these syntaxes so we could assess their readability.  We also spent a lot of time reviewing the documentation for each syntax and scouring the web for expert opinions on them.  We developed a set of criteria for evaluating syntaxes, and then weighted the criteria and ranked them.

OwlSerializationDecisionMatrix

Click to expand serialization decision matrix

Our criteria for editing models by hand fell into two main categories. First, the syntax should be easy for humans to read and write. Specifically, during the review, four human-friendly qualities stood out in particular:

  1. The syntax provides an alternative to using absolute IRIs, such as namespace prefixes or relative IRIs.
  2. The syntax doesn’t force the author to write flat triples, offering features such as inline or nested blank nodes, collections, or lists.
  3. The syntax provides a shorthand for literals and common predicates.
  4. The syntax is not XML, as many people do not enjoy typing numerous angle brackets.

Second, the syntax should have good support from existing Semantic Web tools, such as Apache Jena, the OWL API and various RDF and OWL conversion tools, because you don’t want to invest a lot of effort into building a model only to find that existing tools can’t read it. Since we were focused on hand-editing models, we didn’t take into consideration whether Protégé or TopBraid Composer supported the syntaxes.

Given the human-friendly qualities above, it became easy to rule out several syntaxes immediately.  We learned that four of the syntaxes are closely related: Turtle, N3, N-Triples and TriG, which we began to refer to collectively as the Turtle family. Of the Turtle family, N-Triples lacks the first three qualities, so it was easy to rule out.

Of the other syntaxes, N-Quads also lacks the first 3 qualities. OWL/XML and TriX don’t do so well on the second quality because they lack nesting features.  All three of the normative syntaxes (OWL Functional, OWL/XML and RDF/XML) lack shorthand features.  OWL/XML, RDF/XML, and TriX were further ruled out because they are all XML formats. And if avoiding XML is important, JSON-LD also starts to look less attractive, because it requires typing lots of curly braces and is somewhat verbose.

This leaves us with three of the four Turtle family syntaxes—Turtle, N3, TriG—and the Manchester Syntax.  In our weighted ranking, they were in a four-way tie for first place across the human-friendly ease of use qualities, with good support for all four of our desired qualities and only minor differences in their coverage of the third quality.

We used our second main category of criteria—tool support—to break this tie, which is how Turtle ended up leading the pack.  It has the most support of any serialization syntax other than RDF/XML , which is the only syntax that OWL 2 tools are required to support.  Additionally, if you are a SPARQL user, familiarity with Turtle is useful because it has a great deal of overlap with the syntax of SPARQL’s WHERE clause.

So now you know why Thetus thinks Turtle is the best syntax for writing ontologies by hand! What do you think?

~ Marijane White, Principal Engineer

Savanna 4.1 Has Arrived

With a collective sigh and a rush of engineers to the keg, the latest release of Savanna 4.1 headed out the door. As with every new Savanna release, we’ve added even more features to make your analysis experience fun and effective.

There were numerous enhancements to a variety of features in this release, but we’ll focus on two of our major enhancements: taggable Occurrences and heatmap visualizations.

Occurrence: Dynamic Documents for Dynamic Data

4.1 Blog - Occurrence 2
In 4.1, Savanna’s Occurrence tool continues to allow analysts to create connected information networks by capturing detailed, problem-specific information about people, organizations, places and events within dynamic documents. For example, you might make a Person Occurrence to gather information about the current Portland Mayor, or a Place Occurrence for Portland City Hall.

Because data isn’t always easily defined or categorized, Occurrences now allow analysts to customize information beyond the basic data fields.

Here are two of our new key customization tools:

  • 4.1 Blog - Description 3Descriptions: Now you can add fine-grained detail to your Occurrences by entering new description names and types to capture and easily find specifics about your Occurrences. For example, you could add a physical description to your Portland City Hall Occurrence and fill in information from a specific article or website.

 

 

  • 4.1 Blog - TagTags: Tags quickly reveal important keywords or phrases about an Occurrence. With the new tags fields, you can easily capture categories of an Occurrence that might not be available in a given template. For example, an analyst might make an Architectural Style tag name, and then fill in “Italian Renaissance” as the associated style.

 

These customizable options allow you to make the most of your data, giving you the ability to capture hard-to-define information and build interconnected networks for holistic analysis.

Map: Introducing a Hot New Feature (pun entirely intended)

Maps are everywhere, and digital maps are as commonly used today as atlases were in the 90’s (an atlas, if you remember, is a map depicted on paper). It’s important then that a map does more than show you how to get from point A to point B. Savanna’s Map tool geospatially visualizes data from any integrated or constructed standard open geospatial formats, with content filtering and visibility settings that allow you to quickly view the most relevant information.

4.1’s heatmap enhancement allows you to capture and view trends in specific Map regions, giving you a bird’s eye view of your data that is customizable by radius, display color and opacity.

4.1 Blog - Heatmap 2
You can keep up with upcoming releases and new features here on our blog and in the meantime keep yourself entertained by watching Savanna in action on our YouTube page at http://www.youtube.com/ThetusCorp. Until next time.

Metadata: Data That Describes Data

Say you need a chair and your friend has one that he would be happy to give you. Before you agree to haul it away in a borrowed pick-up, you want to know more about the chair. Is it comfortable? Is it hard to clean? Will it match your décor? Is it the right size for your space?

So you ask your friend a few questions to get more information. What kind of upholstery does the chair have? How long is it? Does it recline? What color is it? All of these questions are meant to provide context for you to decide whether the chair suits your needs. In other words, you are gathering data about the chair. If you think of the chair as a single piece of data (or stated another way, as a data set containing just one record), then the description of the chair is its metadata. Put simply, metadata is data about data.

Ron_Arad_-_The_Big_Easy_chair_in_chrome_steelWhether your data is a physical object, a digital file, or a row in a spreadsheet, metadata is necessary to understand the value, function, and history of your data. If you learned that your friend’s chair was the Ron Arad Big Easy Chair (image at left) you might think twice before taking it. More realistically, you might learn that the chair is a reclining chair, vinyl upholstery, and is 84” wide.

Similarly, imagine reading an undated news article about an outbreak of a deadly disease in a nearby city. Without knowing the article’s publication date, you wouldn’t know if the outbreak is spreading rapidly or if it occurred 20 years ago! Suppose you later learn that the article came from The Onion, a satirical newspaper. You would then assume the article is exaggerated or untrue.

Metadata grows in importance with the quantity of data under consideration. If you have 100,000 chairs to choose from, you need ways to differentiate between them without examining each individually. Identifying metadata common to many chairs would help distinguish collections of like data within your data set. For instance, you might only be interested in recliners or accent chairs. Many chairs will fall into multiple collections, leaving it up to you to decide which are of primary importance.

Savanna, Thetus’s multi-source analysis solution, incorporates metadata throughout the analysis workspace. Users can capture details about people, organizations, places, and concepts in preformatted metadata fields or freeform narratives. Options for describing connections between different types of data allow for easy identification of related content. Privacy settings specify rules about who can access specific data. Cumulatively, Savanna’s metadata tools enable users to contextualize information and thus reap greater insight into complex problems.

Different types of metadata gain prominence depending on the user and what kind of information they need. You asked your friend questions relevant to a chair but your query would have been different if you were discussing a song, a map, or a car. Similarly, different Savanna users present varied forms of information relevant to their problem space. A geospatial analyst in the military assessing locations of weapons stockpiles cares about different data—and thus metadata—than an emergency planner assessing earthquake hazard mitigation strategies. In most cases, users have predefined metadata based on organizational or disciplinary standards.

Regardless of users’ specific needs, the importance of metadata remains. Metadata can be simple and intuitive, as when thinking about the characteristics of a couch, or it can be formalized, detailed, and discipline-specific. Given the potential for metadata to profoundly influence your interpretation of information, you should treat metadata as an integral part of any data set.

~ Rebecca Davies, Analyst

Reference:

“Metadata Guide Working Level.” Australian National Data Service. June, 2011. Accessed December 29, 2014. http://ands.org.au/guides/metadata-working.pdf.

Image Source: “Ron Arad – The Big Easy chair in chrome steel” by Ron Arad – Own work. Licensed under CC BY 3.0 via Wikimedia Commons – http://commons.wikimedia.org/wiki/File:Ron_Arad_-_The_Big_Easy_chair_in_chrome_steel.jpg#mediaviewer/File:Ron_Arad_-_The_Big_Easy_chair_in_chrome_steel.jpg

Flat Design

You know the old saying ‘less is more?’ That seems to be the mantra for the current trend in user interface design. We’re seeing less and less of the 3D object design and more flat design. Less clutter, less fuss and more openness are becoming the norm. And now the trend has been solidified with Apple’s recent iOS7 release.

Apple's 3D design compared to their new flat design

Apple’s 3D design compared to their new flat design

But with simplicity comes a great deal of consideration, without the bells and whistles (gradients, drop shadows etc) the focus is solely on the shapes and color choices of the design. With less text, the design needs to visually explain itself but with a minimal design approach. Have we become so familiar with digital user interfaces that the need to create 3D design to emulate real-world objects is a thing of the past? Or is this cleaner approach to design just another trend?

Thetus Savanna Webcast Tomorrow!

Register to view our webcast tomorrow! 

Wednesday, October 23rd at 11am PST/ 2pm EST

Discover how Savanna, our all-source analysis suite, helps make sense of information and pivots data as new knowledge evolves so you can turn insight into a solution.

GEOINT_Email

Follow

Get every new post delivered to your Inbox.

Join 369 other followers